Text2Arch: A Dataset for Generating Scientific Architecture Diagrams from Natural Language Descriptions

arXiv cs.CL / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces Text2Arch, a new large-scale open-access dataset for generating scientific architecture diagrams from natural-language descriptions.
It explains a pipeline that uses language models to convert text into intermediate code (DOT) that can then be used to render high-fidelity diagrams.
Because prior datasets were lacking, the authors provide paired resources including scientific architecture images, corresponding text, and DOT code representations.
They fine-tune multiple small language models on the dataset and also evaluate in-context learning with GPT-4o, finding that Text2Arch-based models outperform baselines like DiagramAgent and match GPT-4o in performance.
The dataset, code, and trained models are released publicly, enabling further research and open-model development for text-to-diagram tasks.

Abstract

Communicating complex system designs or scientific processes through text alone is inefficient and prone to ambiguity. A system that automatically generates scientific architecture diagrams from text with high semantic fidelity can be useful in multiple applications like enterprise architecture visualization, AI-driven software design, and educational content creation. Hence, in this paper, we focus on leveraging language models to perform semantic understanding of the input text description to generate intermediate code that can be processed to generate high-fidelity architecture diagrams. Unfortunately, no clean large-scale open-access dataset exists, implying lack of any effective open models for this task. Hence, we contribute a comprehensive dataset, \system, comprising scientific architecture images, their corresponding textual descriptions, and associated DOT code representations. Leveraging this resource, we fine-tune a suite of small language models, and also perform in-context learning using GPT-4o. Through extensive experimentation, we show that \system{} models significantly outperform existing baseline models like DiagramAgent and perform at par with in-context learning-based generations from GPT-4o. We make the code, data and models publicly available.