Mochi: Aligning Pre-training and Inference for Efficient Graph Foundation Models via Meta-Learning

arXiv cs.AI / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Mochi, a graph foundation model that uses meta-learning to better unify training and inference for multiple downstream tasks.
  • Earlier graph foundation approaches rely on reconstruction-based pretraining (e.g., link prediction) followed by a separate post-hoc “unification” step, which the authors show can limit downstream performance.
  • Mochi instead pre-trains on few-shot episodes that match the downstream evaluation protocol, so the training objective is aligned with how inference will be performed.
  • Experiments on both synthetic and real-world benchmarks show Mochi and its stronger variant Mochi++ outperform or match existing graph foundation models on 25 datasets across node classification, link prediction, and graph classification.
  • The proposed method also substantially reduces compute, requiring 8–27× less training time than the strongest baseline while maintaining competitive (or better) accuracy.

Abstract

We propose Mochi, a Graph Foundation Model that addresses task unification and training efficiency by adopting a meta-learning based training framework. Prior models pre-train with reconstruction-based objectives such as link prediction, and assume that the resulting representations can be aligned with downstream tasks through a separate unification step such as class prototypes. We demonstrate through synthetic and real-world experiments that this procedure, while simple and intuitive, has limitations that directly affect downstream task performance. To address these limitations, Mochi pre-trains on few-shot episodes that mirror the downstream evaluation protocol, aligning the training objective with inference rather than relying on a post-hoc unification step. We show that Mochi, along with its more powerful variant Mochi++, achieves competitive or superior performance compared to existing Graph Foundation Models across 25 real-world graph datasets spanning node classification, link prediction, and graph classification, while requiring 8\sim27 times less training time than the strongest baseline.