InstructTable: Improving Table Structure Recognition Through Instructions

arXiv cs.CV / 4/6/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents InstructTable, an instruction-guided multi-stage training framework to improve table structure recognition (TSR) for complex layouts with merged or empty cells.
  • It combines table instruction pre-training to boost learning of fine-grained structural patterns with TSR fine-tuning to preserve strong visual information modeling.
  • To support large-scale training and evaluation, the authors propose Table Mix Expand (TME), a template-free method for synthesizing authentic tabular data.
  • Using TME, they build the BCDSTab benchmark with 900 complex synthetic table images and report that InstructTable achieves state-of-the-art TSR performance across FinTabNet, PubTabNet, and MUSTARD.
  • Ablation experiments indicate that tabular-data-specific instructions and the synthetic data generation approach both contribute positively to accuracy.

Abstract

Table structure recognition (TSR) holds widespread practical importance by parsing tabular images into structured representations, yet encounters significant challenges when processing complex layouts involving merged or empty cells. Traditional visual-centric models rely exclusively on visual information while lacking crucial semantic support, thereby impeding accurate structural recognition in complex scenarios. Vision-language models leverage contextual semantics to enhance comprehension; however, these approaches underemphasize the modeling of visual structural information. To address these limitations, this paper introduces InstructTable, an instruction-guided multi-stage training TSR framework. Meticulously designed table instruction pre-training directs attention toward fine-grained structural patterns, enhancing comprehension of complex tables. Complementary TSR fine-tuning preserves robust visual information modeling, maintaining high-precision table parsing across diverse scenarios. Furthermore, we introduce Table Mix Expand (TME), an innovative template-free method for synthesizing large-scale authentic tabular data. Leveraging TME, we construct the Balanced Complex Dense Synthetic Tables (BCDSTab) benchmark, comprising 900 complex table images synthesized through our method to serve as a rigorous benchmark. Extensive experiments on multiple public datasets (FinTabNet, PubTabNet, MUSTARD) and BCDSTab demonstrate that InstructTable achieves state-of-the-art performance in TSR tasks. Ablation studies further confirm the positive impact of the proposed tabular-data-specific instructions and synthetic data.