Feynman: Knowledge-Infused Diagramming Agent for Scalable Visual Designs

arXiv cs.AI / 3/16/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

Feynman presents a scalable diagram generation pipeline that uses domain-specific knowledge components ('ideas'), code planning, and declarative programs to author diagrams and grounded captions.
The diagrams are rendered by the Penrose diagramming system with optimization-based rendering that preserves visual semantics while injecting randomness to enhance layout diversity.
The approach enables the creation of a large, well-aligned diagram-caption dataset (>100k pairs) and a visual-language benchmark called Diagramma to evaluate visual reasoning in vision-language models.
The authors plan to release the dataset, benchmark, and the full agent pipeline as open-source, aiming to reduce cost and time in diagram authoring and evaluation.

Abstract

Visual design is an essential application of state-of-the-art multi-modal AI systems. Improving these systems requires high-quality vision-language data at scale. Despite the abundance of internet image and text data, knowledge-rich and well-aligned image-text pairs are rare. In this paper, we present a scalable diagram generation pipeline built with our agent, Feynman. To create diagrams, Feynman first enumerates domain-specific knowledge components (''ideas'') and performs code planning based on the ideas. Given the plan, Feynman translates ideas into simple declarative programs and iterates to receives feedback and visually refine diagrams. Finally, the declarative programs are rendered by the Penrose diagramming system. The optimization-based rendering of Penrose preserves the visual semantics while injecting fresh randomness into the layout, thereby producing diagrams with visual consistency and diversity. As a result, Feynman can author diagrams along with grounded captions with very little cost and time. Using Feynman, we synthesized a dataset with more than 100k well-aligned diagram-caption pairs. We also curate a visual-language benchmark, Diagramma, from freshly generated data. Diagramma can be used for evaluating the visual reasoning capabilities of vision-language models. We plan to release the dataset, benchmark, and the full agent pipeline as an open-source project.

What 81,000 people want from AI

Anthropic News

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

日経XTECH

「AIで雇用は増える」「AIの進化はツールがけん引」、5つのAI潮流を解説

日経XTECH

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

日経XTECH

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも

日経XTECH

Feynman: Knowledge-Infused Diagramming Agent for Scalable Visual Designs

Key Points

Abstract

Related Articles

What 81,000 people want from AI

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

「AIで雇用は増える」「AIの進化はツールがけん引」、5つのAI潮流を解説

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

What 81,000 people want from AI

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

「AIで雇用は増える」「AIの進化はツールがけん引」、5つのAI潮流を解説

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

中国AI企業が他社製AIを「ただ乗り蒸留」か 米社が主張、安全保障リスクも

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも