InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information

arXiv cs.CL / 5/4/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • InterChart is a new diagnostic benchmark for evaluating how vision-language models (VLMs) perform multi-chart visual reasoning tasks relevant to areas like scientific reporting and finance.
  • The benchmark moves beyond single, uniform charts by including diverse question types such as entity inference, trend correlation, numerical estimation, and multi-step abstract reasoning across 2–3 related charts.
  • InterChart is structured into three difficulty tiers—(1) reasoning on single charts, (2) integrative analysis on aligned synthetic chart sets, and (3) semantic inference on visually complex, real-world chart pairs.
  • The authors’ evaluation shows that VLM accuracy drops sharply as chart complexity increases, and that decomposition of multi-entity charts into simpler units improves performance, indicating weaknesses in cross-chart integration.
  • Overall, InterChart is positioned as a rigorous framework to identify systematic limitations and guide progress on multimodal reasoning in complex multi-visual settings.

Abstract

We introduce InterChart, a diagnostic benchmark that evaluates how well vision-language models (VLMs) reason across multiple related charts, a task central to real-world applications such as scientific reporting, financial analysis, and public policy dashboards. Unlike prior benchmarks focusing on isolated, visually uniform charts, InterChart challenges models with diverse question types ranging from entity inference and trend correlation to numerical estimation and abstract multi-step reasoning grounded in 2-3 thematically or structurally related charts. We organize the benchmark into three tiers of increasing difficulty: (1) factual reasoning over individual charts, (2) integrative analysis across synthetically aligned chart sets, and (3) semantic inference over visually complex, real-world chart pairs. Our evaluation of state-of-the-art open- and closed-source VLMs reveals consistent and steep accuracy declines as chart complexity increases. We find that models perform better when we decompose multi-entity charts into simpler visual units, underscoring their struggles with cross-chart integration. By exposing these systematic limitations, InterChart provides a rigorous framework for advancing multimodal reasoning in complex, multi-visual environments.