Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning
arXiv cs.AI / 3/31/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that current academic-paper reasoning benchmarks are mostly search-oriented and therefore do not capture researcher-style full-document understanding, cross-checking, and verification.
- It introduces ScholScan, a new scan-oriented benchmark that tasks multimodal LLMs with reading entire papers and identifying consistency issues.
- ScholScan includes 1,800 annotated questions across nine error categories, covering 13 natural-science domains and 715 papers, with evidence localization and reasoning traces plus a unified evaluation protocol.
- Experiments with 15 models across 24 input settings show that retrieval-augmented generation (RAG) does not yield significant gains, highlighting systematic weaknesses on scan-oriented tasks.
- The authors position ScholScan as the leading representative benchmark for the proposed scan-oriented paradigm in academic paper reasoning.
Related Articles
[D] How does distributed proof of work computing handle the coordination needs of neural network training?
Reddit r/MachineLearning

BYOK is not just a pricing model: why it changes AI product trust
Dev.to

AI Citation Registries and Identity Persistence Across Records
Dev.to

Building Real-Time AI Voice Agents with Google Gemini 3.1 Flash Live and VideoSDK
Dev.to

Your Knowledge, Your Model: A Method for Deterministic Knowledge Externalization
Dev.to