TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering

arXiv cs.AI / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that existing multi-turn table question answering methods suffer from accumulated representation errors caused by fixed text serialization across turns.
  • It proposes TABQAWORLD, a training-free multimodal table reasoning framework that dynamically switches between visual and textual representations to improve table state readout reliability.
  • TABQAWORLD also improves planning by using table metadata (e.g., dimensions, data types, key values) to safely optimize stepwise reasoning trajectories and compress low-complexity actions.
  • Experiments report state-of-the-art results, including +4.87% accuracy versus baselines and +33.35% inference latency reduction, outperforming static representation settings.
  • The work targets more deployment-practical multi-turn table reasoning by reducing both error accumulation and conversation-turn/latency costs.

Abstract

Multimodal reasoning has emerged as a powerful framework for enhancing reasoning capabilities of reasoning models. While multi-turn table reasoning methods have improved reasoning accuracy through tool use and reward modeling, they rely on fixed text serialization for table state readouts. This introduces representation errors in table encoding that significantly accumulate over multiple turns. Such accumulation is alleviated by tabular grounding methods in the expense of inference compute and cost, rendering real world deployment impractical. To address this, we introduce TABQAWORLD, a table reasoning framework that jointly optimizes tabular action through representation and estimation. For representation, TABQAWORLD employs an action-conditioned multimodal selection policy, which dynamically switches between visual and textual representations to maximize table state readout reliability. For estimation, TABQAWORLD optimizes stepwise reasoning trajectory through table metadata including dimension, data types and key values, safely planning trajectory and compressing low-complexity actions to reduce conversation turns and latency. Designed as a training-free framework, empirical evaluations show that TABQAWORLD achieves state-of-the-art performance with 4.87% accuracy improvements over baselines, with 5.42% accuracy gain and 33.35% inference latency reduction over static settings, establishing a new standard for reliable and efficient table reasoning.