Off-the-shelf Vision Models Benefit Image Manipulation Localization

arXiv cs.CV / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that image manipulation localization (IML) and general vision tasks should be treated as connected directions, with semantic priors potentially helping IML performance.
  • It introduces ReVi, a trainable adapter designed to repurpose off-the-shelf vision models (including image generation and segmentation networks) for IML without altering the base models.
  • ReVi uses an approach inspired by robust principal component analysis to separate semantic redundancy from manipulation-specific signals and then amplify the manipulation-relevant components.
  • The method is efficient to deploy because it freezes the original vision model parameters and fine-tunes only the lightweight adapter, avoiding extensive redesign and full retraining.
  • Experiments indicate improved IML results and suggest that scalable IML frameworks can be built by plugging adapters into existing general-purpose vision backbones.

Abstract

Image manipulation localization (IML) and general vision tasks are typically treated as two separate research directions due to the fundamental differences between manipulation-specific and semantic features. In this paper, however, we bridge this gap by introducing a fresh perspective: these two directions are intrinsically connected, and general semantic priors can benefit IML. Building on this insight, we propose a novel trainable adapter (named ReVi) that repurposes existing off-the-shelf general-purpose vision models (e.g., image generation and segmentation networks) for IML. Inspired by robust principal component analysis, the adapter disentangles semantic redundancy from manipulation-specific information embedded in these models and selectively enhances the latter. Unlike existing IML methods that require extensive model redesign and full retraining, our method relies on the off-the-shelf vision models with frozen parameters and only fine-tunes the proposed adapter. The experimental results demonstrate the superiority of our method, showing the potential for scalable IML frameworks.

Off-the-shelf Vision Models Benefit Image Manipulation Localization | AI Navigate