When Visuals Aren't the Problem: Evaluating Vision-Language Models on Misleading Data Visualizations

arXiv cs.AI / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a benchmark to evaluate Vision-Language Models (VLMs) on misleading visualization–caption pairs, covering both reasoning errors (e.g., cherry-picking, causal inference) and visualization design errors (e.g., truncated or dual axes, inappropriate encodings).
  • It uses real-world charts combined with human-authored, curated misleading captions to isolate which specific error types models fail to detect.
  • Across evaluations of many commercial and open-source VLMs, the study finds models are more reliable at identifying visual design deception than reasoning-based misinformation.
  • The research also observes a tendency for VLMs to misclassify non-deceptive visualizations as misleading, suggesting weaknesses in precision and attribution.
  • Overall, the work aims to close the gap between general “misleading content” detection and pinpointing the exact reasoning or visualization error responsible for deception.

Abstract

Visualizations help communicate data insights, but deceptive data representations can distort their interpretation and propagate misinformation. While recent Vision Language Models (VLMs) perform well on many chart understanding tasks, their ability to detect misleading visualizations, especially when deception arises from subtle reasoning errors in captions, remains poorly understood. Here, we evaluate VLMs on misleading visualization-caption pairs grounded in a fine-grained taxonomy of reasoning errors (e.g., Cherry-picking, Causal inference) and visualization design errors (e.g., Truncated axis, Dual axis, inappropriate encodings). To this end, we develop a benchmark that combines real-world visualization with human-authored, curated misleading captions designed to elicit specific reasoning and visualization error types, enabling controlled analysis across error categories and modalities of misleadingness. Evaluating many commercial and open-source VLMs, we find that models detect visual design errors substantially more reliably than reasoning-based misinformation, and frequently misclassify non-misleading visualizations as deceptive. Overall, our work fills a gap between coarse detection of misleading content and the attribution of the specific reasoning or visualization errors that give rise to it.