Are Natural-Domain Foundation Models Effective for Accelerated Cardiac MRI Reconstruction?

arXiv cs.LG / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study tests whether natural-domain foundation models (e.g., CLIP, DINOv2) can act as effective image priors for accelerated cardiac MRI reconstruction, compared with domain-specific approaches like BiomedCLIP.
  • It introduces an unrolled reconstruction framework that uses pretrained, frozen visual encoders inside each reconstruction cascade to guide the image formation process.
  • Experiments show that task-specific state-of-the-art reconstruction models (such as E2E-VarNet) can outperform foundation-model-based methods on standard in-distribution data.
  • In cross-domain evaluations (training on cardiac MRI and testing on knee/brain datasets), foundation-model-based methods demonstrate better robustness, especially at high acceleration rates and with limited low-frequency sampling.
  • The work concludes that natural-image-pretrained models learn transferable structural representations that improve generalization, while domain-specific pretraining (BiomedCLIP) yields smaller gains in more ill-posed cases.

Abstract

The emergence of large-scale pretrained foundation models has transformed computer vision, enabling strong performance across diverse downstream tasks. However, their potential for physics-based inverse problems, such as accelerated cardiac MRI reconstruction, remains largely underexplored. In this work, we investigate whether natural-domain foundation models can serve as effective image priors for accelerated cardiac MRI reconstruction, and compare the performance obtained against domain-specific counterparts such as BiomedCLIP. We propose an unrolled reconstruction framework that incorporates pretrained, frozen visual encoders, such as CLIP, DINOv2, and BiomedCLIP, within each cascade to guide the reconstruction process. Through extensive experiments, we show that while task-specific state-of-the-art reconstruction models such as E2E-VarNet achieve superior performance in standard in-distribution settings, foundation-model-based approaches remain competitive. More importantly, in challenging cross-domain scenarios, where models are trained on cardiac MRI and evaluated on anatomically distinct knee and brain datasets--foundation models exhibit improved robustness, particularly under high acceleration factors and limited low-frequency sampling. We further observe that natural-image-pretrained models, such as CLIP, learn highly transferable structural representations, while domain-specific pretraining (BiomedCLIP) provides modest additional gains in more ill-posed regimes. Overall, our results suggest that pretrained foundation models offer a promising source of transferable priors, enabling improved robustness and generalization in accelerated MRI reconstruction.