Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models

arXiv cs.LG / 3/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

TreeKD transfers complementary knowledge from tree-based specialist models into LLMs by verbalizing their learned predictive rules as natural language to augment context.
Specialist decision trees are trained on functional group features and their rules are verbalized to enable rule-augmented context learning in LLMs.
A rule-consistency technique at test time ensembles predictions across diverse rules from a Random Forest to improve robustness.
Experiments on 22 ADMET properties from the TDC benchmark show that TreeKD substantially improves LLM performance and narrows the gap to state-of-the-art specialist models.
The results advance toward practical generalist models for molecular property prediction.

Abstract

Molecular Property Prediction (MPP) is a central task in drug discovery. While Large Language Models (LLMs) show promise as generalist models for MPP, their current performance remains below the threshold for practical adoption. We propose TreeKD, a novel knowledge distillation method that transfers complementary knowledge from tree-based specialist models into LLMs. Our approach trains specialist decision trees on functional group features, then verbalizes their learned predictive rules as natural language to enable rule-augmented context learning. This enables LLMs to leverage structural insights that are difficult to extract from SMILES strings alone. We further introduce rule-consistency, a test-time scaling technique inspired by bagging that ensembles predictions across diverse rules from a Random Forest. Experiments on 22 ADMET properties from the TDC benchmark demonstrate that TreeKD substantially improves LLM performance, narrowing the gap with SOTA specialist models and advancing toward practical generalist models for molecular property prediction.

Check out this article on AI-Driven Reporting 2.0: From Manual Bottlenecks to Real-Time Decision Intelligence (2026 Edition)

Dev.to

SYNCAI

Dev.to

How AI-Powered Decision Making is Reshaping Enterprise Strategy in 2024

Dev.to

When AI Grows Up: Identity, Memory, and What Persists Across Versions

Dev.to

AI-Driven Reporting 2.0: From Manual Bottlenecks to Real-Time Decision Intelligence (2026 Edition)

Dev.to

Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models

Key Points

Abstract

Related Articles

Check out this article on AI-Driven Reporting 2.0: From Manual Bottlenecks to Real-Time Decision Intelligence (2026 Edition)

SYNCAI

How AI-Powered Decision Making is Reshaping Enterprise Strategy in 2024

When AI Grows Up: Identity, Memory, and What Persists Across Versions

AI-Driven Reporting 2.0: From Manual Bottlenecks to Real-Time Decision Intelligence (2026 Edition)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer