Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers
arXiv cs.CL / 4/3/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces Paper Reconstruction Evaluation (PaperRecon), a framework that tests AI-written papers by generating a new full draft from an automatically created overview and comparing it to the original source paper.
- It evaluates two separate risk/quality dimensions: Presentation quality (via a rubric) and Hallucination risk (via agentic evaluation grounded in the original paper).
- The authors release PaperWrite-Bench, comprising 51 post-2025 top-venue papers across diverse domains to support systematic evaluation of coding-agent paper writing.
- Experimental results show a trade-off between AI systems: ClaudeCode tends to score higher on presentation but averages over 10 hallucinations per paper, while Codex reduces hallucinations at the expense of presentation quality.
- The work is positioned as an early step toward standardizing reliability and risk assessment for AI-driven research-paper generation.
Related Articles

Black Hat Asia
AI Business

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story
Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure
Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts
MarkTechPost

Portable eye scanner powered by AI expands access to low-cost community screening
Reddit r/artificial