When the Forger Is the Judge: GPT-Image-2 Cannot Recognize Its Own Faked Documents
arXiv cs.CV / 4/29/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The paper reports that GPT-Image-2 can generate or edit document images (e.g., receipt fields) in under a second with low cost, effectively blurring the visual line between authentic and AI-altered documents.
- The authors release AIForge-Doc v2, a paired dataset of 3,066 GPT-Image-2 forgeries with pixel-precise masks in DocTamper-compatible format, along with benchmarks using human inspection and three computational detection approaches.
- Human inspectors’ accuracy in distinguishing AI forgeries from real documents is 0.501 (near chance), and the computational judges perform only modestly above chance (TruFor 0.599, DocTamper 0.585, and GPT-Image-2 used as a zero-shot self-judge 0.532).
- The “self-judge” strategy fails consistently across multiple prompt and ambiguity-handling policies, with AUC never exceeding 0.59, indicating GPT-Image-2 cannot reliably recognize its own inpainting/editing.
- Calibration on same-domain traditional tampering shows the detectors work well on non-AI edits (TruFor AUC 0.962, DocTamper AUC 0.852), but performance drops by 0.27–0.36 when GPT-Image-2 inpainting is used, isolating a GPT-Image-2-specific detection gap; the dataset, pipeline, protocol, and calibration sets are released.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles
LLMs will be a commodity
Reddit r/artificial

HubSpot Just Legitimized AEO: What It Means for Your Brand AI Visibility
Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally
Reddit r/LocalLLaMA

From Fault Codes to Smart Fixes: How Google Cloud NEXT ’26 Inspired My AI Mechanic Assistant
Dev.to

Dex lands $5.3M to grow its AI-driven talent matching platform
Tech.eu