Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation
arXiv cs.CV / 3/20/2026
💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- They present a benchmarking framework for PDF table extraction that uses synthetically generated PDFs with precise LaTeX ground truth and realistic tables sourced from arXiv to capture diversity and complexity.
- A central contribution is the use of LLMs as judges for semantic evaluation of tables, integrated into a matching pipeline that tolerates inconsistencies in parser outputs.
- In a human validation study with over 1,500 quality judgments, the LLM-based evaluation shows substantially higher correlation with human judgment (Pearson r=0.93) than TEDS (r=0.68) and GriTS (r=0.70).
- Evaluating 21 contemporary PDF parsers across 100 synthetic documents containing 451 tables reveals notable performance disparities and yields practical guidance for selecting parsers for tabular data extraction.
- The work provides a reproducible, scalable evaluation methodology and makes code and data available on GitHub for broader adoption.
Related Articles
I Built an AI That Audits Other AI Agents for Token Waste — Launching on Product Hunt Today
Dev.to

Check out this article on AI-Driven Reporting 2.0: From Manual Bottlenecks to Real-Time Decision Intelligence (2026 Edition)
Dev.to

SYNCAI
Dev.to
How AI-Powered Decision Making is Reshaping Enterprise Strategy in 2024
Dev.to
When AI Grows Up: Identity, Memory, and What Persists Across Versions
Dev.to