Exascale Multi-Task Graph Foundation Models for Imbalanced, Multi-Fidelity Atomistic Data

arXiv cs.AI / 4/20/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper introduces an exascale materials-discovery workflow using atomistic graph foundation models built on HydraGNN.
  • It jointly trains on 16 open first-principles datasets (544M+ structures, 85+ elements) with a multi-task architecture and a scalable ADIOS2/DDStore data pipeline.
  • At Frontier, the authors run six large-scale DeepHyper hyperparameter-optimization campaigns (FP64) and then train the best message-passing models on sustained 2,048-node runs to produce a PaiNN-based lead model.
  • The lead model supports billion-scale screening by evaluating 1.1B atomistic structures in about 50 seconds, enabling fast and data-scarce fine-tuning across diverse downstream tasks.
  • The work analyzes precision/compute tradeoffs across BF16/FP32/FP64 and demonstrates transfer to twelve chemically diverse downstream tasks while validating strong and weak scaling across Frontier, Aurora, and Perlmutter.

Abstract

We present an exascale workflow for materials discovery using atomistic graph foundation models built on HydraGNN. We jointly train on 16 open first-principles datasets (544+ million structures covering 85+ elements) using a multi-task architecture with per-dataset heads and a scalable ADIOS2/DDStore data pipeline. On Frontier, we execute six large-scale DeepHyper hyperparameter optimization campaigns in FP64 and promote the top-performing message-passing models to sustained 2,048-node training, yielding a PaiNN-based lead model. The resulting model enables billion-scale screening, evaluating 1.1 billion atomistic structures in 50 seconds, compressing a workload that would require years of first-principles computation, and supports data-scarce fine-tuning across diverse downstream tasks. We quantify precision-performance tradeoffs (BF16/FP32/FP64), demonstrate transfer across twelve chemically diverse downstream tasks, and establish seamless strong- and weak-scaling across Frontier, Aurora, and Perlmutter. This work allows fast and reliable exploration of vast chemical design spaces that are otherwise inaccessible to first-principles methods.