Agentar-Fin-OCR

arXiv cs.CV / 3/12/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

Agentar-Fin-OCR is introduced as a document parsing system tailored to financial-domain documents that converts ultra-long PDFs into semantically consistent, structured outputs with auditing-grade provenance.
It combines Cross-page Contents Consolidation and Document-level Heading Hierarchy Reconstruction to restore continuity across pages and build a globally consistent TOC for structure-aware retrieval, along with a difficulty-adaptive curriculum learning strategy and a CellBBoxRegressor to localize table cells from decoder states without external detectors.
The work introduces FinDocBench, a benchmark with six financial document categories and metrics like TocEDS, cross-page TEDS, and Table Cell IoU to evaluate table parsing across finance documents.
Experiments show state-of-the-art models on FinDocBench and position Agentar-Fin-OCR as a practical foundation for reliable downstream financial document applications.

Abstract

In this paper, we propose Agentar-Fin-OCR, a document parsing system tailored to financial-domain documents, transforming ultra-long financial PDFs into semantically consistent, highly accurate, structured outputs with auditing-grade provenance. To address finance-specific challenges such as complex layouts, cross-page structural discontinuities, and cell-level referencing capability, Agentar-Fin-OCR combines (1) a Cross-page Contents Consolidation algorithm to restore continuity across pages and a Document-level Heading Hierarchy Reconstruction (DHR) module to build a globally consistent Table of Contents (TOC) tree for structure-aware retrieval, and (2) a difficulty-adaptive curriculum learning training strategy for table parsing, together with a CellBBoxRegressor module that uses structural anchor tokens to localize table cells from decoder hidden states without external detectors. Experiments demonstrate that our model shows high performance on the table parsing metrics of OmniDocBench. To enable realistic evaluation in the financial vertical, we further introduce FinDocBench, a benchmark that includes six financial document categories with expert-verified annotations and evaluation metrics including Table of Contents edit-distance-based similarity (TocEDS), cross-page concatenated TEDS, and Table Cell Intersection over Union (C-IoU). We evaluate a wide range of state-of-the-art models on FinDocBench to assess their capabilities and remaining limitations on financial documents. Overall, Agentar-Fin-OCR and FinDocBench provide a practical foundation for reliable downstream financial document applications.