The Deployment Gap in AI Media Detection: Platform-Aware and Visually Constrained Adversarial Evaluation

arXiv cs.CV / 4/14/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that AI media detectors look nearly perfect in clean lab tests but fail to account for real-world platform transformations like resizing, compression, re-encoding, and visual distortions that create a “deployment gap.”
It proposes a platform-aware adversarial evaluation framework that models these deployment transforms and uses visually plausible, meme-style band perturbations rather than unrestricted full-image noise.
Under this threat model, detectors with about AUC ≈ 0.99 in clean settings show substantial accuracy degradation, including high fake-to-real misclassification rates under per-image platform-aware attacks.
The authors also find calibration collapse—detectors become confidently incorrect—indicating that robustness issues include reliability of confidence estimates, not just raw detection accuracy.
They report that universal perturbations can exist even with localized visual constraints, and they recommend adopting platform-aware evaluation in future AI media security benchmarks, releasing their framework for standardized testing.

Abstract

Recent AI media detectors report near-perfect performance under clean laboratory evaluation, yet their robustness under realistic deployment conditions remains underexplored. In practice, AI-generated images are resized, compressed, re-encoded, and visually modified before being shared on online platforms. We argue that this creates a deployment gap between laboratory robustness and real-world reliability. In this work, we introduce a platform-aware adversarial evaluation framework for AI media detection that explicitly models deployment transforms (e.g., resizing, compression, screenshot-style distortions) and constrains perturbations to visually plausible meme-style bands rather than full-image noise. Under this threat model, detectors achieving AUC

\approx

0{.}99 in clean settings experience substantial degradation. Per-image platform-aware attacks reduce AUC to significantly lower levels and achieve high fake-to-real misclassification rates, despite strict visual constraints. We further demonstrate that universal perturbations exist even under localized band constraints, revealing shared vulnerability directions across inputs. Beyond accuracy degradation, we observe pronounced calibration collapse under attack, where detectors become confidently incorrect. Our findings highlight that robustness measured under clean conditions substantially overestimates deployment robustness. We advocate for platform-aware evaluation as a necessary component of future AI media security benchmarks and release our evaluation framework to facilitate standardized robustness assessment.