In-Context Molecular Property Prediction with LLMs: A Blinding Study on Memorization and Knowledge Conflicts
arXiv cs.LG / 3/30/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies whether LLMs genuinely perform in-context molecular property regression or mainly rely on memorization, addressing concerns about benchmark contamination.
- It runs progressively blinded experiments that reduce accessible information to disentangle effects from pre-trained knowledge versus in-context examples.
- Nine LLM variants from GPT-4.1, GPT-5, and Gemini 2.5 families are evaluated on three MoleculeNet datasets (Delaney solubility, Lipophilicity, QM7 atomization energy).
- The experiments include controlled in-context sample sizes (0-, 60-, and 1000-shot) to test how the amount of provided context affects performance and potential memorization behavior.
- The authors propose a principled evaluation framework to assess molecular property prediction under controlled information access and to surface conflicts between pre-training and in-context learning.




