Rhamba: Region-Aware Hybrid Attention-Mamba Framework for Self-Supervised Learning in Resting-State fMRI

arXiv cs.LG / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Rhamba is a new self-supervised pretraining framework for resting-state fMRI that combines anatomically guided, region-aware masking with hybrid Attention-Mamba sequence modeling.
  • Pretraining on the ABIDE dataset uses region-aligned patch embeddings and three masking strategies (Any, Majority, Pure with increasing spatial specificity), and the paper evaluates multiple architectural variants including Mamba-only, alternating Mamba/Attention, and two hybrid encoder-decoder setups (AM and MA).
  • In downstream fine-tuning for schizophrenia and ADHD detection (COBRE and ADHD-200), the masking strategy affected reconstruction loss in a consistent order (Any > Majority > Pure) but produced only modest, dataset-dependent differences in downstream performance.
  • The MA hybrid configuration (Mamba-Attention) delivered the best average AUROC across both datasets, and region-wise attribution via Integrated Gradients showed that peak performance depends on the interaction of masking strategy and architecture.
  • The authors claim Rhamba outperforms prior state-of-the-art approaches while offering a flexible trade-off among interpretability, scalability, and performance for large-scale fMRI representation learning.

Abstract

Self-supervised pretraining is promising for large-scale neuroimaging, yet the impact of region-aware masking and hybrid sequence modeling remains underexplored. In this work, we introduce Rhamba, a region-aware pretraining framework that integrates anatomically guided masking with hybrid Attention-Mamba architectures for resting state functional magnetic resonance imaging (fMRI) analysis. Models were pretrained on the ABIDE dataset using region-aligned patch embeddings and three masking strategies (Any, Majority, and Pure) with increasing spatial specificity. We evaluated four architectural variants: a Mamba only model, an Alternate architecture with interleaved Mamba and Attention blocks, and two hybrid encoder-decoder configurations (Attention-Mamba (AM) and Mamba-Attention (MA)). The pretrained models were fine-tuned on downstream classification tasks using the COBRE and ADHD-200 datasets for schizophrenia and attention-deficit/hyperactivity disorder discrimination. We employed Integrated Gradients, an explainable AI method, to identify the brain regions contributing to model predictions. Masking strategy strongly influenced reconstruction behavior, with reconstruction loss following a consistent ordering (Any > Majority > Pure). However, this trend did not directly translate into downstream performance, where differences were modest and dataset-dependent. The hybrid architecture with the MA configuration achieved the highest average AUROC across both datasets, and Rhamba outperformed state-of-the-art methods in comparative evaluation. Region-wise analysis showed that peak performance depends on the interaction between masking strategy and architecture rather than a single dominant configuration. Overall, Rhamba offers a flexible framework for balancing interpretability, scalability, and performance in large-scale fMRI representation learning.