FRIGID: Scaling Diffusion-Based Molecular Generation from Mass Spectra at Training and Inference Time

arXiv cs.LG / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper introduces FRIGID, a diffusion-based molecular generation framework that produces molecular structures from mass spectra using intermediate fingerprint representations and known chemical formulas.
  • FRIGID is trained on large-scale data comprising hundreds of millions of unlabeled molecular structures, leveraging a novel diffusion language model design.
  • The authors propose an inference-time scaling method using forward fragmentation models to detect spectrum-inconsistent fragments and then refine them via targeted remasking and denoising.
  • Reported results show FRIGID achieves over 18% Top-1 accuracy on the MassSpecGym benchmark and triples Top-1 accuracy on NPLIB1 compared with leading methods, with performance scaling approximately log-linearly as inference-time compute increases.
  • The authors release the FRIGID code publicly, supporting reproducibility and further research into compute-scaled de novo structural elucidation.

Abstract

In this work, we present FRIGID, a framework with a novel diffusion language model that generates molecular structures conditioned on mass spectra via intermediate fingerprint representations and determined chemical formulae, training at the scale of hundreds of millions of unlabeled structures. We then demonstrate how forward fragmentation models enable inference-time scaling by identifying spectrum-inconsistent fragments and refining them through targeted remasking and denoising. While FRIGID already achieves strong performance with its diffusion base, inference-time scaling significantly improves its accuracy, surpassing 18% Top-1 accuracy on the challenging MassSpecGym benchmark and tripling the Top-1 accuracy of the leading methods on NPLIB1. Further empirical analyses show that FRIGID exhibits log-linear performance scaling with increasing inference-time compute, opening a promising new direction for continued improvements in de novo structural elucidation. FRIGID code is publicly available at https://github.com/coleygroup/FRIGID