Beyond Semantic Priors: Mitigating Optimization Collapse for Generalizable Visual Forensics

arXiv cs.CV / 3/26/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies a failure mode called “Optimization Collapse” in visual forensics detectors (built on Sharpness-Aware Minimization, SAM) where performance degrades to near-random guessing on non-semantic deepfakes when the perturbation radius grows beyond a narrow threshold.
It introduces the Critical Optimization Radius (COR) to formalize geometric stability of the optimization landscape and the Gradient Signal-to-Noise Ratio (GSNR) to estimate intrinsic generalization potential.
Theoretical results show COR increases monotonically with GSNR, linking the collapse to layer-wise attenuation of gradient fidelity rather than to perturbation size alone.
Instead of only shrinking perturbation radius (which stabilizes training but doesn’t fix intrinsic generalization), the authors propose CoRIT, which uses a contrastive gradient proxy plus training-free mechanisms for region refinement, signal preservation, and hierarchical representation integration.
Experiments report that CoRIT mitigates Optimization Collapse and improves state-of-the-art generalization on cross-domain and universal forgery benchmarks.

Abstract

While Vision-Language Models (VLMs) like CLIP have emerged as a dominant paradigm for generalizable deepfake detection, a representational disconnect remains: their semantic-centric pre-training is ill-suited for capturing non-semantic artifacts inherent to hyper-realistic synthesis. In this work, we identify a failure mode termed Optimization Collapse, where detectors trained with Sharpness-Aware Minimization (SAM) degenerate to random guessing on non-semantic forgeries once the perturbation radius exceeds a narrow threshold. To theoretically formalize this collapse, we propose the Critical Optimization Radius (COR) to quantify the geometric stability of the optimization landscape, and leverage the Gradient Signal-to-Noise Ratio (GSNR) to measure generalization potential. We establish a theorem proving that COR increases monotonically with GSNR, thereby revealing that the geometric instability of SAM optimization originates from degraded intrinsic generalization potential. This result identifies the layer-wise attenuation of GSNR as the root cause of Optimization Collapse in detecting non-semantic forgeries. Although naively reducing perturbation radius yields stable convergence under SAM, it merely treats the symptom without mitigating the intrinsic generalization degradation, necessitating enhanced gradient fidelity. Building on this insight, we propose the Contrastive Regional Injection Transformer (CoRIT), which integrates a computationally efficient Contrastive Gradient Proxy (CGP) with three training-free strategies: Region Refinement Mask to suppress CGP variance, Regional Signal Injection to preserve CGP magnitude, and Hierarchical Representation Integration to attain more generalizable representations. Extensive experiments demonstrate that CoRIT mitigates optimization collapse and achieves state-of-the-art generalization across cross-domain and universal forgery benchmarks.

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Dev.to

Mercor competitor Deccan AI raises $25M, sources experts from India

Dev.to

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

Dev.to

How Should Students Document AI Usage in Academic Work?

Dev.to

I asked my AI agent to design a product launch image. Here's what came back.

Dev.to

Beyond Semantic Priors: Mitigating Optimization Collapse for Generalizable Visual Forensics

Key Points

Abstract

Related Articles

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Mercor competitor Deccan AI raises $25M, sources experts from India

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

How Should Students Document AI Usage in Academic Work?

I asked my AI agent to design a product launch image. Here's what came back.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer