MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts

arXiv cs.CL / 4/29/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses a key limitation in LLM-based molecule understanding: existing systems often lack fine-grained alignment between molecules and the specific caption phrases that describe their properties.
It introduces “fine-grained alignments” as explicit correspondences between a molecule’s substructures and the textual phrases explaining those properties, aiming to improve accuracy and explainability.
To avoid costly expert annotations, the authors propose MolReFlect, a teacher–student framework where a teacher LLM generates and refines substructure-to-phrase mappings and then trains a student LLM on these detailed alignments.
Experiments show MolReFlect achieves state-of-the-art results on the molecule-caption translation task and can outperform prior baselines.
The project provides released code on GitHub to support reproduction and further research.

Abstract

Molecule discovery is a pivotal research field, impacting everything from medicine to materials. Recently, Large Language Models (LLMs) have been widely adopted in molecular understanding and generation, serving as a bridge between the molecular space and the natural language space, yet the alignment between molecules and their corresponding captions remains a significant challenge. Previous endeavors typically treat molecules as monolithic inputs, lacking an intermediate reasoning process and sacrificing explainability. In this work, we define fine-grained alignments as the precise correspondence between a molecule's sub-structures and the textual phrases that explain their properties. These alignments are crucial for LLMs to understand molecules in a more accurate and explainable manner. Normally, such fine-grained alignments require expert annotation, which is both costly and time-consuming. To allow LLMs to automatically label and learn the fine-grained alignments, we propose MolReFlect, a novel teacher-student framework, where a teacher LLM first generates and refines mappings between caption phrases and SMILES substructures and then explicitly teaches these detailed alignments to a student LLM. Experimental results demonstrate that MolReFlect enables LLMs to significantly outperform previous baselines, achieving the state-of-the-art performance in the molecule-caption translation task. Our codes are available via: https://github.com/phenixace/MolReFlect.

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

Dev.to

IK_LLAMA now supports Qwen3.5 MTP Support :O

Reddit r/LocalLLaMA

OpenAI models, Codex, and Managed Agents come to AWS

Dev.to

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

Vertical SaaS for Startups 2026: Building a Niche AI-First Product

Dev.to

MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts

Key Points

Abstract

Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

IK_LLAMA now supports Qwen3.5 MTP Support :O

OpenAI models, Codex, and Managed Agents come to AWS

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Vertical SaaS for Startups 2026: Building a Niche AI-First Product

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer