MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts

arXiv cs.CL / 4/29/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a key limitation in LLM-based molecule understanding: existing systems often lack fine-grained alignment between molecules and the specific caption phrases that describe their properties.
  • It introduces “fine-grained alignments” as explicit correspondences between a molecule’s substructures and the textual phrases explaining those properties, aiming to improve accuracy and explainability.
  • To avoid costly expert annotations, the authors propose MolReFlect, a teacher–student framework where a teacher LLM generates and refines substructure-to-phrase mappings and then trains a student LLM on these detailed alignments.
  • Experiments show MolReFlect achieves state-of-the-art results on the molecule-caption translation task and can outperform prior baselines.
  • The project provides released code on GitHub to support reproduction and further research.

Abstract

Molecule discovery is a pivotal research field, impacting everything from medicine to materials. Recently, Large Language Models (LLMs) have been widely adopted in molecular understanding and generation, serving as a bridge between the molecular space and the natural language space, yet the alignment between molecules and their corresponding captions remains a significant challenge. Previous endeavors typically treat molecules as monolithic inputs, lacking an intermediate reasoning process and sacrificing explainability. In this work, we define fine-grained alignments as the precise correspondence between a molecule's sub-structures and the textual phrases that explain their properties. These alignments are crucial for LLMs to understand molecules in a more accurate and explainable manner. Normally, such fine-grained alignments require expert annotation, which is both costly and time-consuming. To allow LLMs to automatically label and learn the fine-grained alignments, we propose MolReFlect, a novel teacher-student framework, where a teacher LLM first generates and refines mappings between caption phrases and SMILES substructures and then explicitly teaches these detailed alignments to a student LLM. Experimental results demonstrate that MolReFlect enables LLMs to significantly outperform previous baselines, achieving the state-of-the-art performance in the molecule-caption translation task. Our codes are available via: https://github.com/phenixace/MolReFlect.