Can GRPO Boost Complex Multimodal Table Understanding?

arXiv cs.CL / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that multimodal table understanding is hindered by complex table layouts and difficult logical reasoning, where supervised fine-tuning (SFT) is common but reinforcement learning has faced issues like low initial policy accuracy and coarse rewards.
It proposes Table-R1, a three-stage reinforcement learning framework that combines a warm-up stage, Perception Alignment GRPO (PA-GRPO) with continuous Tree-Edit-Distance Similarity (TEDS) rewards, and Hint-Completion GRPO (HC-GRPO) using fine-grained, hint-guided residual-step rewards.
Experiments on both held-in and held-out datasets show Table-R1 improves table reasoning performance beyond SFT and standard GRPO, indicating that the stage design effectively addresses initialization bottlenecks and reward sparsity.
A key result is that Qwen2-VL-7B with Table-R1 surpasses larger table-specific models such as Table-LLaVA 13B, and reaches performance comparable to the closed-source GPT-4o on held-in datasets.
Overall, the work suggests GRPO-style RL can be made substantially more effective for multimodal table understanding through tailored reward shaping and multi-phase training.

Abstract

Existing table understanding methods face challenges due to complex table structures and intricate logical reasoning. While supervised finetuning (SFT) dominates existing research, reinforcement learning (RL), such as Group Relative Policy Optimization (GRPO), has shown promise but struggled with low initial policy accuracy and coarse rewards in tabular contexts. In this paper, we introduce Table-R1, a three-stage RL framework that enhances multimodal table understanding through: (1) Warm-up that prompts initial perception and reasoning capabilities, (2) Perception Alignment GRPO (PA-GRPO), which employs continuous Tree-Edit-Distance Similarity (TEDS) rewards for recognizing table structures and contents, and (3) Hint-Completion GRPO (HC-GRPO), which utilizes fine-grained rewards of residual steps based on the hint-guided question. Extensive experiments demonstrate that Table-R1 can boost the model's table reasoning performance obviously on both held-in and held-out datasets, outperforming SFT and GRPO largely. Notably, Qwen2-VL-7B with Table-R1 surpasses larger specific table understanding models (e.g., Table-LLaVA 13B), even achieving comparable performance to the closed-source model GPT-4o on held-in datasets, demonstrating the efficacy of each stage of Table-R1 in overcoming initialization bottlenecks and reward sparsity, thereby advancing robust multimodal table understanding.

I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial

Dev.to

The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage

Dev.to

AI 自主演化的時代來臨：從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage

Dev.to

Neural Networks in Mobile Robot Motion

Dev.to

Retraining vs Fine-tuning or Transfer Learning? [D]

Reddit r/MachineLearning

Can GRPO Boost Complex Multimodal Table Understanding?

Key Points

Abstract

Related Articles

I Extended the Trending mcp-brasil Project with AI Generation — Full Tutorial

The Rise of Self-Evolving AI: From Stanford Theory to Google AlphaEvolve and Berkeley OpenSage

AI 自主演化的時代來臨：從 Stanford 理論到 Google AlphaEvolve 與 Berkeley OpenSage

Neural Networks in Mobile Robot Motion

Retraining vs Fine-tuning or Transfer Learning? [D]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer