Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation

arXiv cs.CV / 4/27/2026

📰 NewsModels & Research

Key Points

  • The paper highlights that existing text-to-image (T2I) models still lack reliability for knowledge-intensive tasks where domain knowledge, structural constraints, and symbolic conventions must be strictly followed.
  • It introduces KVBench, a curriculum-grounded benchmark with 1,800 expert-curated prompts across six high-school subjects, sourced from 30+ authoritative textbooks, to evaluate scientific and logical correctness.
  • Evaluations of 14 state-of-the-art open- and closed-source T2I models show notable weaknesses in logical reasoning, symbolic precision, and multilingual robustness, with open-source models generally trailing proprietary ones.
  • To improve scientific fidelity, the authors propose KE-Check, a two-stage approach that enriches structured prompts through knowledge elaboration and then refines outputs using a checklist-driven constraint-violation and editing loop.
  • The dataset and code for KVBench are released publicly to support further research and benchmarking.

Abstract

Recent text-to-image (T2I) models have demonstrated impressive capabilities in photorealistic synthesis and instruction following. However, their reliability in knowledge-intensive settings remains largely unexplored. Unlike natural image generation, knowledge visualization requires not only semantic alignment but also strict adherence to domain knowledge, structural constraints, and symbolic conventions, exposing a critical gap between visual plausibility and scientific correctness. To systematically study this problem, we introduce KVBench, a curriculum-grounded benchmark for evaluating knowledge-intensive T2I generation. KVBench covers six senior high-school subjects: Biology, Chemistry, Geography, History, Mathematics, and Physics. The benchmark consists of 1,800 expert-curated prompts derived from over 30 authoritative textbooks. Using this benchmark, we evaluate 14 state-of-the-art open- and closed-source models, revealing substantial deficiencies in logical reasoning, symbolic precision, and multilingual robustness, with open-source models consistently underperforming proprietary systems. To address these limitations, we further propose KE-Check, a two-stage framework that improves scientific fidelity via (1) Knowledge Elaboration for structured prompt enrichment, and (2) Checklist-Guided Refinement for explicit constraint enforcement through violation identification and constraint-guided editing. KE-Check effectively mitigates scientific hallucinations, narrowing the performance gap between open-source and leading closed-source models. Data and codes are publicly available at https://github.com/zhaoran66/KVBench.