A Dual Cross-Attention Graph Learning Framework For Multimodal MRI-Based Major Depressive Disorder Detection

arXiv cs.CV / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a dual cross-attention multimodal fusion framework that models bidirectional interactions between structural MRI (sMRI) and resting-state fMRI (rs-fMRI) for major depressive disorder (MDD) detection.
  • Experiments on the large-scale REST-meta-MDD dataset evaluate the method with structural and functional brain atlas configurations using 10-fold stratified cross-validation.
  • Results show the proposed approach delivers robust and competitive performance across atlas types and improves over simple feature-level concatenation for functional atlases.
  • The best-performing model reports 84.71% accuracy, 86.42% sensitivity, 82.89% specificity, 84.34% precision, and 85.37% F1-score, highlighting the value of explicitly learning cross-modal relationships.
  • The study argues that cross-modal interaction modeling is critical for multimodal neuroimaging-based classification where single-modality signals are insufficient.

Abstract

Major depressive disorder (MDD) is a prevalent mental disorder associated with complex neurobiological changes that cannot be fully captured using a single imaging modality. The use of multimodal magnetic resonance imaging (MRI) provides a more comprehensive understanding of brain changes by combining structural and functional data. Despite this, the effective integration of these modalities remains challenging. In this study, we propose a dual cross-attention-based multimodal fusion framework that explicitly models bidirectional interactions between structural MRI (sMRI) and resting-state functional MRI (rs-fMRI) representations. The proposed approach is tested on the large-scale REST-meta-MDD dataset using both structural and functional brain atlas configurations. Numerous experiments conducted under a 10-fold stratified cross-validation demonstrated that the proposed fusion algorithm achieves robust and competitive performance across all atlas types. The proposed method consistently outperforms conventional feature-level concatenation for functional atlases, while maintaining comparable performance for structural atlases. The most effective dual cross-attention multimodal model obtained 84.71% accuracy, 86.42% sensitivity, 82.89% specificity, 84.34% precision, and 85.37% F1-score. These findings emphasize the importance of explicitly modeling cross-modal interactions for multimodal neuroimaging-based MDD classification.