MosaicMRI: A Diverse Dataset and Benchmark for Raw Musculoskeletal MRI

arXiv stat.ML / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • MosaicMRIは、筋骨格系(MSK)領域の完全サンプリング生MRI測定データセットで、2,671ボリューム/80,156スライスを含む大規模・多様性重視の公開ベンチマークとして提示されました。
  • データは撮像断面(軸位・矢状位など)、コントラスト(PD/T1/T2)、対象解剖(脊椎・膝・股関節・足首など)、使用コイル数などが幅広く、従来の脳や膝中心データ偏重による評価の限界を補うことを狙っています。
  • VarNetを加速再構成のベースラインに用いて、モデル容量とデータサイズの両面からスケーリング特性を体系的に検証し、低サンプル条件では複数解剖を混ぜて学習したモデルが解剖特化より優れる傾向を示しました。
  • 解剖をまたいだ汎化(例:脊椎で学習して膝で評価)も評価し、足と肘など相互に良好に一般化する身体部位グループや、ドメインシフト下の性能が学習データ規模・解剖・プロトコル要因の組み合わせに依存する点を明らかにしています。

Abstract

Deep learning underpins a wide range of applications in MRI, including reconstruction, artifact removal, and segmentation. However, progress has been driven largely by public datasets focused on brain and knee imaging, shaping how models are trained and evaluated. As a result, careful studies of the reliability of these models across diverse anatomical settings remain limited. In this work, we introduce MosaicMRI, a large and diverse collection of fully sampled raw musculoskeletal (MSK) MR measurements designed for training and evaluating machine-learning-based methods. MosaicMRI is the largest open-source raw MSK MRI dataset to date, comprising 2,671 volumes and 80,156 slices. The dataset offers substantial diversity in volume orientation (e.g., axial, sagittal), imaging contrasts (e.g., PD, T1, T2), anatomies (e.g., spine, knee, hip, ankle, and others), and numbers of acquisition coils. Using VarNet as a baseline for accelerated reconstruction task, we perform a comprehensive set of experiments to study scaling behavior with respect to both model capacity and dataset size. Interestingly, models trained on the combined anatomies significantly outperform anatomy-specific models in low-sample regimes, highlighting the benefits of anatomical diversity and the presence of exploitable cross-anatomical correlations. We further evaluate robustness and cross-anatomy generalization by training models on one anatomy (e.g., spine) and testing them on another (e.g., knee). Notably, we identify groups of body parts (e.g., foot and elbow) that generalize well with each other, and highlight that performance under domain shifts depends on both training set size, anatomy, and protocol-specific factors.