DISCO: Document Intelligence Suite for COmparative Evaluation

arXiv cs.CL / 3/26/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

DISCO is introduced as a Document Intelligence Suite that separately evaluates OCR pipelines and vision-language models (VLMs) on parsing and question answering across varied document types.
The benchmark covers challenging real-world characteristics including handwritten text, multilingual scripts, medical forms, infographics, and multi-page documents.
Results show large performance differences by task and document complexity, indicating that document processing strategy should be selected with an awareness of structure and reasoning needs.
OCR pipelines tend to work better for handwriting and long/multi-page documents due to stronger text grounding for text-heavy reasoning, while VLMs are stronger for multilingual text and visually rich layouts.
Task-aware prompting has mixed outcomes, improving some document types while harming others, highlighting the need for careful prompt selection.

Abstract

Document intelligence requires accurate text extraction and reliable reasoning over document content. We introduce \textbf{DISCO}, a \emph{Document Intelligence Suite for COmparative Evaluation}, that evaluates optical character recognition (OCR) pipelines and vision-language models (VLMs) separately on parsing and question answering across diverse document types, including handwritten text, multilingual scripts, medical forms, infographics, and multi-page documents. Our evaluation shows that performance varies substantially across tasks and document characteristics, underscoring the need for complexity-aware approach selection. OCR pipelines are generally more reliable for handwriting and for long or multi-page documents, where explicit text grounding supports text-heavy reasoning, while VLMs perform better on multilingual text and visually rich layouts. Task-aware prompting yields mixed effects, improving performance on some document types while degrading it on others. These findings provide empirical guidance for selecting document processing strategies based on document structure and reasoning demands.

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Dev.to

Mercor competitor Deccan AI raises $25M, sources experts from India

Dev.to

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

Dev.to

How Should Students Document AI Usage in Academic Work?

Dev.to

I asked my AI agent to design a product launch image. Here's what came back.

Dev.to

DISCO: Document Intelligence Suite for COmparative Evaluation

Key Points

Abstract

Related Articles

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Mercor competitor Deccan AI raises $25M, sources experts from India

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

How Should Students Document AI Usage in Academic Work?

I asked my AI agent to design a product launch image. Here's what came back.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer