DenTab: A Dataset for Table Recognition and Visual QA on Real-World Dental Estimates

arXiv cs.CV / 4/20/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The paper introduces DenTab, a new dataset of 2,000 real-world dental estimate table image crops with high-quality HTML annotations, aiming to better reflect noisy administrative capture conditions.
  • DenTab supports both table recognition (TR) and table visual question answering (TableVQA) on the same inputs, totaling 2,208 questions across 11 categories including retrieval, aggregation, and logic/consistency checks.
  • The authors benchmark 16 systems (14 VLMs plus two OCR baselines) and find that strong structure recovery often fails to deliver accurate results on multi-step arithmetic and consistency questions, even with ground-truth HTML tables.
  • To improve arithmetic reliability without training, they propose the Table Router Pipeline, which routes arithmetic questions to deterministic execution using a VLM-generated structured representation and a rule-based exact computation executor.
  • The dataset and code are planned for public release on GitHub to enable more realistic evaluation and research on reasoning over tables.

Abstract

Tables condense key transactional and administrative information into compact layouts, but practical extraction requires more than text recognition: systems must also recover structure (rows, columns, merged cells, headers) and interpret roles such as line items, subtotals, and totals under common capture artifacts. Many existing resources for table structure recognition and TableVQA are built from clean digital-born sources or rendered tables, and therefore only partially reflect noisy administrative conditions. We introduce DenTab, a dataset of 2{,}000 cropped table images from dental estimates with high-quality HTML annotations, enabling evaluation of table recognition (TR) and table visual question answering (TableVQA) on the same inputs. DenTab includes 2{,}208 questions across eleven categories spanning retrieval, aggregation, and logic/consistency checks. We benchmark 16 systems, including 14 vision--language models (VLMs) and two OCR baselines. Across models, strong structure recovery does not consistently translate into reliable performance on multi-step arithmetic and consistency questions, and these reasoning failures persist even when using ground-truth HTML table inputs. To improve arithmetic reliability without training, we propose the Table Router Pipeline, which routes arithmetic questions to deterministic execution. The pipeline combines (i) a VLM that produces a baseline answer, a structured table representation, and a constrained table program with (ii) a rule-based executor that performs exact computation over the parsed table. The source code and dataset will be made publicly available at https://github.com/hamdilaziz/DenTab.