Curia-2: Scaling Self-Supervised Learning for Radiology Foundation Models

arXiv cs.CV / 4/3/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • The paper introduces Curia-2, an improved self-supervised pre-training strategy for radiology foundation models that targets better representation quality for complex CT/MRI volumes.
  • It enables scaling Vision Transformer architectures up to billion-parameter sizes for multi-modal CT and MRI models, positioning this as a notable step forward for radiology foundation modeling.
  • The authors extend CuriaBench into separate 2D and 3D evaluation tracks to better match slice-based versus volumetric model benchmarking.
  • Results indicate Curia-2 outperforms prior foundation models on vision-only tasks and performs competitively with vision-language models on clinically complex objectives such as detection.
  • The work states that model weights will be made publicly available to support additional research.

Abstract

The rapid growth of medical imaging has fueled the development of Foundation Models (FMs) to reduce the growing, unsustainable workload on radiologists. While recent FMs have shown the power of large-scale pre-training to CT and MRI analysis, there remains significant room to optimize how these models learn from complex radiological volumes. Building upon the Curia framework, this work introduces Curia-2, which significantly improves the original pre-training strategy and representation quality to better capture the specificities of radiological data. The proposed methodology enables scaling the architecture up to billion-parameter Vision Transformers, marking a first for multi-modal CT and MRI FMs. Furthermore, we formalize the evaluation of these models by extending and restructuring CuriaBench into two distinct tracks: a 2D track tailored for slice-based vision models and a 3D track for volumetric benchmarking. Our results demonstrate that Curia-2 outperforms all FMs on vision-focused tasks and fairs competitively to vision-language models on clinically complex tasks such as finding detection. Weights will be made publicly available to foster further research.