Attention-Gated Convolutional Networks for Scanner-Agnostic Quality Assessment

arXiv cs.CV / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces a hybrid CNN-and-attention architecture for structural MRI quality assessment that is designed to be scanner/site invariant despite motion artifacts that can break clinical and large-scale automated analyses.
  • The model combines a hierarchical 2D CNN encoder with multi-head cross-attention to focus on motion-relevant artifact signatures (e.g., ringing and blurring) while suppressing site-specific intensity differences and background noise.
  • Training is performed end-to-end on the MR-ART dataset using 200 subjects, and evaluation is split into “seen site” testing and “unseen site” testing on heterogeneous ABIDE sites.
  • On seen sites, the method reaches very high scan-level performance (accuracy 0.9920, F1-score 0.9919), and it also shows strong domain-shift robustness on unseen sites (accuracy 0.755) without retraining or fine-tuning.
  • The authors conclude that attention-based feature re-weighting can learn universal artifact descriptors that generalize across imaging environments and scanner manufacturers, helping reduce reliance on manual QC.

Abstract

Motion artifacts present a significant challenge in structural MRI (sMRI), often compromising clinical diagnostics and large-scale automated analysis. While manual quality control (QC) remains the gold standard, it is increasingly unscalable for massive longitudinal studies. To address this, we propose a hybrid CNN-Attention framework designed for robust, site-invariant MRI quality assessment. Our architecture integrates a hierarchical 2D CNN encoder for local spatial feature extraction with a multi-head cross-attention mechanism to model global dependencies. This synergy enables the model to prioritize motion relevant artifact signatures, such as ringing and blurring, while dynamically filtering out site-specific intensity variations and background noise. The framework was trained end-to-end on the MR-ART dataset using a balanced cohort of 200 subjects. Performance was evaluated across two tiers: Seen Site Evaluation on a held-out MR-ART partition and Unseen Site Evaluation using 200 subjects from 17 heterogeneous sites in the ABIDE archive. On seen sites, the model achieved a scan-level accuracy of 0.9920 and an F1-score of 0.9919. Crucially, it maintained strong generalization across unseen ABIDE sites (Acc = 0.755) without any retraining or fine-tuning, demonstrating high resilience to domain shift. These results indicate that attention-based feature re-weighting successfully captures universal artifact descriptors, bridging the performance gap between diverse imaging environments and scanner manufacturers.