DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios
arXiv cs.CL / 4/29/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The article introduces DV-World, a new benchmark with 260 tasks aimed at evaluating data visualization (DV) agents under real-world professional conditions rather than overly constrained lab setups.
- DV-World covers three areas: native spreadsheet/dashboard manipulation (DV-Sheet), adapting visual artifacts to new data and programming paradigms (DV-Evolution), and proactive intent alignment using a user simulator (DV-Interact).
- It addresses limitations of prior benchmarks by avoiding code-sandbox confinement, supporting more realistic multi-step workflows, and challenging agents with ambiguous requirements instead of assuming perfect intent.
- The proposed hybrid evaluation combines numerical precision via Table-value Alignment and semantic/visual judgment via MLLM-as-a-Judge with rubrics.
- Initial experiments show state-of-the-art models score below 50% overall, highlighting major gaps in real-world DV capabilities and motivating future enterprise-ready development.
- categories.pyldebug
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles
LLMs will be a commodity
Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026
Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally
Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform
Tech.eu

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring
Dev.to