OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward

arXiv cs.AI / 4/8/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • OmniDiagram is presented as a unified framework for programmable diagram generation that supports multiple diagram code languages and a wider range of task definitions than prior work.
  • The paper introduces “Visual Interrogation Verifies All” (ViVA), a reinforcement learning feedback strategy that evaluates visual structure of rendered diagrams rather than relying on brittle syntax rules or pixel-level matching.
  • ViVA works by actively generating targeted visual inquiries to interrogate diagram fidelity, producing fine-grained signals that enable a self-evolving training loop without requiring manually annotated ground-truth code.
  • The authors also release M3^2Diagram, described as the first large-scale diagram code generation dataset with over 196k high-quality instances.
  • Experiments report that combining supervised fine-tuning (SFT) with ViVA-based RL yields new state-of-the-art results on diagram code generation benchmarks.

Abstract

The paradigm of programmable diagram generation is evolving rapidly, playing a crucial role in structured visualization. However, most existing studies are confined to a narrow range of task formulations and language support, constraining their applicability to diverse diagram types. In this work, we propose OmniDiagram, a unified framework that incorporates diverse diagram code languages and task definitions. To address the challenge of aligning code logic with visual fidelity in Reinforcement Learning (RL), we introduce a novel visual feedback strategy named Visual Interrogation Verifies All (\textsc{Viva}). Unlike brittle syntax-based rules or pixel-level matching, \textsc{Viva} rewards the visual structure of rendered diagrams through a generative approach. Specifically, \textsc{Viva} actively generates targeted visual inquiries to scrutinize diagram visual fidelity and provides fine-grained feedback for optimization. This mechanism facilitates a self-evolving training process, effectively obviating the need for manually annotated ground truth code. Furthermore, we construct M3^2Diagram, the first large-scale diagram code generation dataset, containing over 196k high-quality instances. Experimental results confirm that the combination of SFT and our \textsc{Viva}-based RL allows OmniDiagram to establish a new state-of-the-art (SOTA) across diagram code generation benchmarks.