INDOTABVQA: A Benchmark for Cross-Lingual Table Understanding in Bahasa Indonesia Documents
arXiv cs.AI / 4/15/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- INDOTABVQA is introduced as a new benchmark for cross-lingual table visual question answering on real Bahasa Indonesia document images, paired with QA sets in four languages (Bahasa Indonesia, English, Hindi, Arabic).
- The dataset includes 1,593 document images spanning three visual styles and varying table complexity, enabling evaluation in both monolingual and cross-lingual VQA settings.
- Benchmarking shows substantial performance gaps for leading VLMs (including Qwen2.5-VL, Gemma-3, LLaMA-3.2, and GPT-4o), especially on structurally complex tables and in low-resource languages.
- Targeted fine-tuning improves accuracy by 11.6% (fine-tuning a compact 3B model) and 17.8% (LoRA fine-tuning a 7B model), indicating that domain-specific training can meaningfully boost results.
- Adding explicit table region coordinates as extra input yields an additional 4–7% improvement, highlighting the benefit of spatial priors for structure-aware table reasoning.
Related Articles

Black Hat Asia
AI Business
Vibe Coding Is Changing How We Build Software. ERP Teams Should Pay Attention
Dev.to
Is it actually possible to build a model-agnostic persistent text layer that keeps AI behavior stable?
Reddit r/artificial
Give me your ideass [N]
Reddit r/MachineLearning
Claude Code Plugins for Design Systems & Agent Orchestration for Real Workflows
Dev.to