SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning
arXiv cs.LG / 3/27/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses a key problem in string-based autoregressive molecular generation: the same molecular graph can correspond to multiple token sequences, causing latent “trajectory divergence” as linearization history changes representations of equivalent partial graphs.
- It introduces Structure-Invariant Generative Molecular Alignment (SIGMA), which keeps linear string representations but uses a token-level contrastive objective to align latent states for prefixes that are consistent with identical suffixes while respecting geometric/structural symmetries.
- To improve inference efficiency and avoid redundant exploration, the authors propose Isomorphic Beam Search (IsoBeam), which prunes isomorphic-equivalent paths dynamically during decoding.
- Experiments on standard benchmarks indicate SIGMA improves the balance between sequence scalability and graph fidelity, achieving better sample efficiency and structural diversity during multi-parameter optimization versus strong baselines.
広告




