Toward a Multi-View Brain Network Foundation Model: Cross-View Consistency Learning Across Arbitrary Atlases

arXiv cs.CV / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces MV-BrainFM, a multi-view brain network foundation model aimed at learning generalizable representations from brain networks built using arbitrary atlases.
  • It uses Transformer-based modeling that explicitly incorporates anatomical distance information to better guide inter-regional interactions.
  • The method adds an unsupervised cross-view consistency learning strategy to align representations from multiple atlas views of the same subject into a shared latent space.
  • During pretraining, it jointly enforces within-view robustness and cross-view alignment, and applies a unified multi-view paradigm to train simultaneously across multiple datasets/atlases more efficiently than sequential approaches.
  • Experiments on 20K+ subjects across 17 fMRI datasets show MV-BrainFM outperforming 14 prior brain network foundation models and task-specific baselines in both single-atlas and multi-atlas settings, with stable performance on unseen atlas configurations.

Abstract

Brain network analysis provides an interpretable framework for characterizing brain organization and has been widely used for neurological disorder identification. Recent advances in self-supervised learning have motivated the development of brain network foundation models. However, existing approaches are often limited by atlas dependency, insufficient exploitation of multiple network views, and weak incorporation of anatomical priors. In this work, we propose MV-BrainFM, a multi-view brain network foundation model designed to learn generalizable and scalable representations from brain networks constructed with arbitrary atlases. MV-BrainFM explicitly incorporates anatomical distance information into Transformer-based modeling to guide inter-regional interactions, and introduces an unsupervised cross-view consistency learning strategy to align representations from multiple atlases of the same subject in a shared latent space. By jointly enforcing within-view robustness and cross-view alignment during pretraining, the model effectively captures complementary information across heterogeneous network views while remaining atlas-aware. In addition, MV-BrainFM adopts a unified multi-view pretraining paradigm that enables simultaneous learning from multiple datasets and atlases, significantly improving computational efficiency compared to conventional sequential training strategies. The proposed framework also demonstrates strong scalability, consistently benefiting from increasing data diversity while maintaining stable performance across unseen atlas configurations. Extensive experiments on more than 20K subjects from 17 fMRI datasets show that MV-BrainFM consistently outperforms 14 existing brain network foundation models and task-specific baselines under both single-atlas and multi-atlas settings.