SciFigDetect: A Benchmark for AI-Generated Scientific Figure Detection

arXiv cs.CV / 4/10/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • SciFigDetect is introduced as the first benchmark specifically for detecting AI-generated scientific figures, addressing how this domain differs from open-domain image forensics due to structure, dense text, and scholarly semantics.
  • The dataset is built using an agent-based pipeline that retrieves licensed papers, performs multimodal understanding of text and figures, synthesizes candidate figures via multiple sources, and applies a review-driven refinement loop.
  • It includes multiple figure categories and aligned real–synthetic pairs, enabling evaluation across zero-shot transfer, cross-generator generalization, and degraded-image scenarios.
  • Benchmark results indicate current detectors fail dramatically in zero-shot transfer, overfit strongly to specific generators, and are fragile under common post-processing corruptions.
  • The authors provide the dataset publicly to support research into more robust and generalizable scientific-figure forensics and research-integrity tooling.

Abstract

Modern multimodal generators can now produce scientific figures at near-publishable quality, creating a new challenge for visual forensics and research integrity. Unlike conventional AI-generated natural images, scientific figures are structured, text-dense, and tightly aligned with scholarly semantics, making them a distinct and difficult detection target. However, existing AI-generated image detection benchmarks and methods are almost entirely developed for open-domain imagery, leaving this setting largely unexplored. We present the first benchmark for AI-generated scientific figure detection. To construct it, we develop an agent-based data pipeline that retrieves licensed source papers, performs multimodal understanding of paper text and figures, builds structured prompts, synthesizes candidate figures, and filters them through a review-driven refinement loop. The resulting benchmark covers multiple figure categories, multiple generation sources and aligned real--synthetic pairs. We benchmark representative detectors under zero-shot, cross-generator, and degraded-image settings. Results show that current methods fail dramatically in zero-shot transfer, exhibit strong generator-specific overfitting, and remain fragile under common post-processing corruptions. These findings reveal a substantial gap between existing AIGI detection capabilities and the emerging distribution of high-quality scientific figures. We hope this benchmark can serve as a foundation for future research on robust and generalizable scientific-figure forensics. The dataset is available at https://github.com/Joyce-yoyo/SciFigDetect.