Quality-Aware Calibration for AI-Generated Image Detection in the Wild

arXiv cs.CV / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper argues that AI-generated image detection can be unreliable in the wild because viral sharing creates multiple near-duplicate versions that degrade through repeated recompression, resizing, and cropping.
  • It proposes QuAD (Quality-Aware calibration with near-Duplicates), which retrieves a query image’s online near-duplicates, runs detection on each, and aggregates scores using a quality estimate per instance.
  • To evaluate at scale, the authors introduce AncesTree (an in-lab 136k-image dataset modeled as stochastic degradation trees) and ReWIND (a real-world ~10k near-duplicate dataset from viral web content).
  • Experiments across multiple state-of-the-art detectors show that QuAD’s quality-aware fusion improves performance, achieving about an 8% average gain in balanced accuracy versus simple averaging.
  • The work emphasizes that reliable detection of AI-generated content in real applications should jointly analyze all available online versions rather than treating each image in isolation.

Abstract

Significant progress has been made in detecting synthetic images, however most existing approaches operate on a single image instance and overlook a key characteristic of real-world dissemination: as viral images circulate on the web, multiple near-duplicate versions appear and lose quality due to repeated operations like recompression, resizing and cropping. As a consequence, the same image may yield inconsistent forensic predictions based on which version has been analyzed. In this work, to address this issue we propose QuAD (Quality-Aware calibration with near-Duplicates) a novel framework that makes decisions based on all available near-duplicates of the same image. Given a query, we retrieve its online near-duplicates and feed them to a detector: the resulting scores are then aggregated based on the estimated quality of the corresponding instance. By doing so, we take advantage of all pieces of information while accounting for the reduced reliability of images impaired by multiple processing steps. To support large-scale evaluation, we introduce two datasets: AncesTree, an in-lab dataset of 136k images organized in stochastic degradation trees that simulate online reposting dynamics, and ReWIND, a real-world dataset of nearly 10k near-duplicate images collected from viral web content. Experiments on several state-of-the-art detectors show that our quality-aware fusion improves their performance consistently, with an average gain of around 8% in terms of balanced accuracy compared to plain average. Our results highlight the importance of jointly processing all the images available online to achieve reliable detection of AI-generated content in real-world applications. Code and data are publicly available at https://grip-unina.github.io/QuAD/