AEGIS: A Holistic Benchmark for Evaluating Forensic Analysis of AI-Generated Academic Images
arXiv cs.CV / 5/1/2026
📰 NewsModels & Research
Key Points
- The paper introduces AEGIS, a holistic benchmark to evaluate forensic analysis of AI-generated academic images across seven academic categories and 39 fine-grained subtypes.
- AEGIS expands beyond prior work by incorporating domain-specific complexity, where even GPT-5.1 achieves 48.80% overall performance and expert models show limited localization accuracy (IoU 30.09%).
- It uses diverse forgery simulations based on four common academic forgery strategies implemented across 25 generative models, finding that forensic accuracy often remains below 50% and lags behind generation capabilities.
- The benchmark evaluates forensics in multiple dimensions—detection, reasoning, and localization—showing complementary strengths between model families (e.g., MLLMs at 84.74% for textual artifact recognition and expert detectors at 79.54% for binary authenticity detection).
- By testing 25 leading MLLMs, nine expert models, and a unified multimodal understanding/generation model, AEGIS is positioned as a diagnostic testbed exposing fundamental limitations in current academic image forensics.
Related Articles

Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...
Dev.to

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia
Dev.to

MCP, Skills, AI Agents, and New Models: The New Stack for Software Development
Dev.to

GitHub - intel/auto-round: A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.
Reddit r/LocalLLaMA

ChatGPT's goblin obsession may be hilarious, but it points to a deeper problem in AI training
THE DECODER