Adaptive Forensic Feature Refinement via Intrinsic Importance Perception

arXiv cs.CV / 4/21/2026

📰 NewsModels & Research

Key Points

  • The paper targets synthetic image detection (SID), focusing on cross-distribution generalization when images come from previously unseen generative sources.
  • While visual foundation models (VFM) can improve SID via image–text pretraining priors, existing adaptation methods are described as too coarse and may either ignore the best representation level or risk degrading open-set generalization.
  • The authors reformulate VFM adaptation as a joint optimization that (1) finds the most forgery-discriminative representational layer and (2) limits how much task learning perturbs the pretrained cross-modal structure.
  • They propose I2P (Intrinsic Importance Perception), which adaptively selects critical layer representations and performs task-driven updates within a low-sensitivity parameter subspace to boost task specificity while preserving transferability.
  • Overall, the contribution is a more fine-grained, structure-preserving adaptation strategy for VFM-based SID to better handle unknown generation sources.

Abstract

With the rapid development of generative models and multimodal content editing technologies, the key challenge faced by synthetic image detection (SID) lies in cross-distribution generalization to unknown generation sources. In recent years, visual foundation models (VFM), which acquire rich visual priors through large scale image-text alignment pretraining, have become a promising technical route for improving the generalization ability of SID. However, existing VFM-based methods remain relatively coarse-grained in their adaptation strategies. They typically either directly use the final layer representations of VFM or simply fuse multi layer features, lacking explicit modeling of the optimal representational hierarchy for transferable forgery cues. Meanwhile, although directly fine-tuning VFM can enhance task adaptation, it may also damage the cross-modal pretrained structure that supports open-set generalization. To address this task specific tension, we reformulate VFM adaptation for SID as a joint optimization problem: it is necessary both to identify the critical representational layer that is more suitable for carrying forgery discriminative information and to constrain the disturbance caused by task knowledge injection to the pretrained structure. Based on this, we propose I2P, an SID framework centered on intrinsic importance perception. I2P first adaptively identifies the critical layer representations that are most discriminative for SID, and then constrains task-driven parameter updates within a low sensitivity parameter subspace, thereby improving task specificity while preserving the transferable structure of pretrained representations as much as possible.