Autotuning T-PaiNN: Enabling Data-Efficient GNN Interatomic Potential Development via Classical-to-Quantum Transfer Learning

arXiv cs.LG / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Transfer-PaiNN (T-PaiNN), a transfer learning framework to improve data efficiency in GNN-based machine-learned interatomic potentials (MLIPs) by using classical force-field data for pretraining.
  • T-PaiNN pretrains a PaiNN GNN model on large classical molecular simulation datasets, then performs fine-tuning (“autotuning”) with a much smaller DFT dataset to achieve quantum-level accuracy.
  • Experiments on QM9 (gas-phase) and liquid water (condensed phase) show order-of-magnitude reductions in mean absolute error compared with models trained only on DFT data.
  • In low-data settings, the approach reports up to 25× error reductions and faster training convergence, indicating that classical sampling helps the model learn general potential energy surface features before quantum refinement.
  • The authors argue the framework is a practical and computationally efficient strategy for developing high-accuracy, data-efficient MLIPs that can broaden MLIP applicability to more complex chemical systems.

Abstract

Machine-learned interatomic potentials (MLIPs), particularly graph neural network (GNN)-based models, offer a promising route to achieving near-density functional theory (DFT) accuracy at significantly reduced computational cost. However, their practical deployment is often limited by the large volumes of expensive quantum mechanical training data required. In this work, we introduce a transfer learning framework, Transfer-PaiNN (T-PaiNN), that substantially improves the data efficiency of GNN-MLIPs by leveraging inexpensive classical force field data. The approach consists of pretraining a PaiNN MLIP architecture on large-scale datasets generated from classical molecular simulations, followed by fine-tuning (dubbed autotuning) using a comparatively small DFT dataset. We demonstrate the effectiveness of autotuning T-PaiNN on both gas-phase molecular systems (QM9 dataset) and condensed-phase liquid water. Across all cases, T-PaiNN significantly outperforms models trained solely on DFT data, achieving order-of-magnitude reductions in mean absolute error while accelerating training convergence. For example, using the QM9 data set, error reductions of up to 25 times are observed in low-data regimes, while liquid water simulations show improved predictions of energies, forces, and experimentally relevant properties such as density and diffusion. These gains arise from the model's ability to learn general features of the potential energy surface from extensive classical sampling, which are subsequently refined to quantum accuracy. Overall, this work establishes transfer learning from classical force fields as a practical and computationally efficient strategy for developing high-accuracy, data-efficient GNN interatomic potentials, enabling broader application of MLIPs to complex chemical systems.