Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation
arXiv cs.CV / 3/20/2026
💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- They present a benchmarking framework for PDF table extraction that uses synthetically generated PDFs with precise LaTeX ground truth and realistic tables sourced from arXiv to capture diversity and complexity.
- A central contribution is the use of LLMs as judges for semantic evaluation of tables, integrated into a matching pipeline that tolerates inconsistencies in parser outputs.
- In a human validation study with over 1,500 quality judgments, the LLM-based evaluation shows substantially higher correlation with human judgment (Pearson r=0.93) than TEDS (r=0.68) and GriTS (r=0.70).
- Evaluating 21 contemporary PDF parsers across 100 synthetic documents containing 451 tables reveals notable performance disparities and yields practical guidance for selecting parsers for tabular data extraction.
- The work provides a reproducible, scalable evaluation methodology and makes code and data available on GitHub for broader adoption.
Related Articles

I built an online background remover and learned a lot from launching it
Dev.to
How AI is Transforming Dynamics 365 Business Central
Dev.to
Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm
Reddit r/artificial
Do I need different approaches for different types of business information errors?
Dev.to
ShieldCortex: What We Learned Protecting AI Agent Memory
Dev.to