GPT4o-Receipt: A Dataset and Human Study for AI-Generated Document Forensics
arXiv cs.AI / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- GPT4o-Receipt provides a dataset of 1,235 receipt images pairing GPT-4o-generated receipts with authentic receipts, plus evaluation across five state-of-the-art multimodal LLMs and a 30-annotator perceptual study.
- The study finds that humans are better at perceiving AI artifacts visually but worse at detecting AI-generated documents overall, with annotators showing the largest visual discrimination gap yet lower binary detection F1 than Claude Sonnet 4 and Gemini 2.5 Flash.
- The key forensic signal in AI-generated receipts is arithmetic errors (e.g., incorrect subtotals) that are invisible to visual inspection but verifiable by LLMs in milliseconds.
- The results reveal dramatic performance disparities and calibration differences among models, making simple accuracy metrics unreliable for detector selection, and the authors release GPT4o-Receipt and all results publicly to support future AI document-forensics research.
Related Articles
Automating the Chase: AI for Festival Vendor Compliance
Dev.to
MCP Skills vs MCP Tools: The Right Way to Configure Your Server
Dev.to
500 AI Prompts Every Content Creator Needs in 2026 (20 Free Samples)
Dev.to
Building a Game for My Daughter with AI — Part 1: What If She Could Build It Too?
Dev.to

Math needs thinking time, everyday knowledge needs memory, and a new Transformer architecture aims to deliver both
THE DECODER