Leveraging Image Editing Foundation Models for Data-Efficient CT Metal Artifact Reduction

arXiv cs.CV / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses metal artifact reduction in CT scans, noting that high-attenuation implants can severely degrade image quality and overwhelm standard deep learning approaches that need large paired datasets.
  • It reframes artifact reduction as an in-context reasoning task by adapting a general-purpose vision-language diffusion foundation model using parameter-efficient LoRA, cutting data needs to just 16–128 paired examples (about two orders of magnitude reduction).
  • The authors find that domain adaptation is essential to prevent hallucinations, because without adaptation the foundation model may misinterpret streak artifacts as real objects.
  • To better ground restored anatomy, they introduce a multi-reference conditioning strategy that supplies clean anatomical exemplars from other subjects alongside the corrupted input for category-specific inference.
  • Experiments on the AAPM CT-MAR benchmark report state-of-the-art results on perceptual and radiological-feature metrics and provide released code.

Abstract

Metal artifacts from high-attenuation implants severely degrade CT image quality, obscuring critical anatomical structures and posing a challenge for standard deep learning methods that require extensive paired training data. We propose a paradigm shift: reframing artifact reduction as an in-context reasoning task by adapting a general-purpose vision-language diffusion foundation model via parameter-efficient Low-Rank Adaptation (LoRA). By leveraging rich visual priors, our approach achieves effective artifact suppression with only 16 to 128 paired training examples reducing data requirements by two orders of magnitude. Crucially, we demonstrate that domain adaptation is essential for hallucination mitigation; without it, foundation models interpret streak artifacts as erroneous natural objects (e.g., waffles or petri dishes). To ground the restoration, we propose a multi-reference conditioning strategy where clean anatomical exemplars from unrelated subjects are provided alongside the corrupted input, enabling the model to exploit category-specific context to infer uncorrupted anatomy. Extensive evaluation on the AAPM CT-MAR benchmark demonstrates that our method achieves state-of-the-art performance on perceptual and radiological-feature metrics . This work establishes that foundation models, when appropriately adapted, offer a scalable alternative for interpretable, data-efficient medical image reconstruction. Code is available at https://github.com/ahmetemirdagi/CT-EditMAR.