Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation

arXiv cs.CV / 4/27/2026

📰 NewsModels & Research

共有:

Key Points

The paper highlights that existing text-to-image (T2I) models still lack reliability for knowledge-intensive tasks where domain knowledge, structural constraints, and symbolic conventions must be strictly followed.
It introduces KVBench, a curriculum-grounded benchmark with 1,800 expert-curated prompts across six high-school subjects, sourced from 30+ authoritative textbooks, to evaluate scientific and logical correctness.
Evaluations of 14 state-of-the-art open- and closed-source T2I models show notable weaknesses in logical reasoning, symbolic precision, and multilingual robustness, with open-source models generally trailing proprietary ones.
To improve scientific fidelity, the authors propose KE-Check, a two-stage approach that enriches structured prompts through knowledge elaboration and then refines outputs using a checklist-driven constraint-violation and editing loop.
The dataset and code for KVBench are released publicly to support further research and benchmarking.

Abstract

Recent text-to-image (T2I) models have demonstrated impressive capabilities in photorealistic synthesis and instruction following. However, their reliability in knowledge-intensive settings remains largely unexplored. Unlike natural image generation, knowledge visualization requires not only semantic alignment but also strict adherence to domain knowledge, structural constraints, and symbolic conventions, exposing a critical gap between visual plausibility and scientific correctness. To systematically study this problem, we introduce KVBench, a curriculum-grounded benchmark for evaluating knowledge-intensive T2I generation. KVBench covers six senior high-school subjects: Biology, Chemistry, Geography, History, Mathematics, and Physics. The benchmark consists of 1,800 expert-curated prompts derived from over 30 authoritative textbooks. Using this benchmark, we evaluate 14 state-of-the-art open- and closed-source models, revealing substantial deficiencies in logical reasoning, symbolic precision, and multilingual robustness, with open-source models consistently underperforming proprietary systems. To address these limitations, we further propose KE-Check, a two-stage framework that improves scientific fidelity via (1) Knowledge Elaboration for structured prompt enrichment, and (2) Checklist-Guided Refinement for explicit constraint enforcement through violation identification and constraint-guided editing. KE-Check effectively mitigates scientific hallucinations, narrowing the performance gap between open-source and leading closed-source models. Data and codes are publicly available at https://github.com/zhaoran66/KVBench.

Subagents: The Building Block of Agentic AI

Dev.to

DeepSeek-V4 Models Could Change Global AI Race

AI Business

Got OpenAI's privacy filter model running on-device via ExecuTorch

Reddit r/LocalLLaMA

The Agent-Skill Illusion: Why Prompt-Based Control Fails in Multi-Agent Business Consulting Systems

Dev.to

We Built a Voice AI Receptionist in 8 Weeks — Every Decision We Made and Why

Dev.to

Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation

Key Points

Abstract

Related Articles

Subagents: The Building Block of Agentic AI

DeepSeek-V4 Models Could Change Global AI Race

Got OpenAI's privacy filter model running on-device via ExecuTorch

The Agent-Skill Illusion: Why Prompt-Based Control Fails in Multi-Agent Business Consulting Systems

We Built a Voice AI Receptionist in 8 Weeks — Every Decision We Made and Why

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer