Do Multilingual VLMs Reason Equally? A Cross-Lingual Visual Reasoning Audit for Indian Languages
arXiv cs.CL / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces what it claims is the first cross-lingual visual reasoning audit for multiple Indian languages using 980 translated questions across MathVista, ScienceQA, and MMMU.
- Using IndicTrans2 for translation and Gemini 2.0 Flash for cross-verification on sample sets, the authors report solid inter-translator agreement (0.79–0.84) before evaluating eight vision-language models across seven languages.
- Results show a substantial accuracy drop of 9.8–25 percentage points when moving from English to Indian languages, with Dravidian languages experiencing up to 13.2 pp more drop than Indo-Aryan languages.
- Chain-of-thought prompting generally harms performance for Bengali and Kannada rather than improving it, suggesting that many “reasoning chains” are English-centric.
- Even a multilingual VLM (Aya-Vision-8B) still shows a large drop (28.5 pp) on Dravidian scripts, and the authors release the benchmark plus all model outputs.


