DiagramBank: A Large-scale Dataset of Diagram Design Exemplars with Paper Metadata for Retrieval-Augmented Generation
arXiv cs.AI / 4/25/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper introduces DiagramBank, a large-scale dataset of 89,422 schematic scientific diagrams paired with paper metadata to support retrieval-augmented generation of publication-quality figures.
- DiagramBank is designed to address a key bottleneck in end-to-end “AI scientist” systems: generating teaser/strategic diagrams rather than relying on missing components or low-quality plot substitutes.
- The dataset is built via an automated curation pipeline that extracts figures and their in-text figure references, then uses a CLIP-based filter to separate schematic diagrams from standard plots and natural images.
- Each diagram instance is linked with contextual text (e.g., from abstract and caption) plus figure-reference pairs, enabling retrieval at multiple query granularities.
- The authors release DiagramBank in an index-ready format along with a retrieval-augmented generation codebase to demonstrate exemplar-conditioned teaser figure synthesis.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat USA
AI Business
Navigating WooCommerce AI Integrations: Lessons for Agencies & Developers from a Bluehost Conflict
Dev.to

Underwhelming or underrated? DeepSeek V4 shows “impressive” gains
SCMP Tech

Pics of new rig!
Reddit r/LocalLLaMA

Claude Code: Hooks, Subagents, and Skills — Complete Guide
Dev.to