RobustMedSAM: Degradation-Resilient Medical Image Segmentation via Robust Foundation Model Adaptation

arXiv cs.CV / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • RobustMedSAM addresses the gap in SAM-based medical image segmentation where performance drops under realistic corruptions like noise, blur, motion artifacts, and modality-specific distortions.
  • The work identifies a complementary split of responsibilities in SAM: the image encoder carries medical priors while the mask decoder drives corruption robustness.
  • RobustMedSAM uses module-wise checkpoint fusion by combining the MedSAM image encoder with the RobustSAM mask decoder (shared ViT-B), then fine-tunes only the mask decoder across 35 datasets spanning 6 modalities and 12 corruption types.
  • Freezing the other components aims to preserve pretrained medical representations while improving robustness; the paper also explores an SVD-based parameter-efficient variant for limited encoder adaptation.
  • Experiments on in- and out-of-distribution benchmarks show degraded-image Dice improves from 0.613 (SAM) to 0.719 (+0.106), indicating the fusion strategy is practical for robust medical segmentation.

Abstract

Medical image segmentation models built on Segment Anything Model (SAM) achieve strong performance on clean benchmarks, yet their reliability often degrades under realistic image corruptions such as noise, blur, motion artifacts, and modality-specific distortions. Existing approaches address either medical-domain adaptation or corruption robustness, but not both jointly. In SAM, we find that these capabilities are concentrated in complementary modules: the image encoder preserves medical priors, while the mask decoder governs corruption robustness. Motivated by this observation, we propose RobustMedSAM, which adopts module-wise checkpoint fusion by initializing the image encoder from MedSAM and the mask decoder from RobustSAM under a shared ViT-B architecture. We then fine-tune only the mask decoder on 35 medical datasets from MedSegBench, spanning six imaging modalities and 12 corruption types, while freezing the remaining components to preserve pretrained medical representations. We additionally investigate an SVD-based parameter-efficient variant for limited encoder adaptation. Experiments on both in-distribution and out-of-distribution benchmarks show that RobustMedSAM improves degraded-image Dice from 0.613 to 0.719 (+0.106) over SAM, demonstrating that structured fusion of complementary pretrained models is an effective and practical approach for robust medical image segmentation.