DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level Diagrams

arXiv cs.AI / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • DiagramNet introduces the first multimodal dataset tailored to system-level chip architecture diagrams, addressing the lack of standardized symbols and structured training data.
  • The dataset includes 10,977 connection annotations and 15,515 chain-of-thought QA pairs spanning four tasks: Listing, Localization, Connection, and Circuit QA.
  • The authors propose a progressive training pipeline and a decoupled multi-agent workflow that splits visual understanding into Perception, Reasoning, and Knowledge stages.
  • On the DiagramNet benchmark, a 3B-parameter model with the workflow reportedly outperforms the 2025 EDA Elite Challenge winner and exceeds GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro by more than 2x in end-to-end evaluation.
  • The workflow is claimed to generalize across models, yielding large Task 1 gains (e.g., 128.7x with Gemini-2.5-Pro) and enabling effective transfer to AMSBench with only 60 adaptation images, outperforming Netlistify in connectivity reasoning.

Abstract

System-level diagrams encode the architectural blueprint of chip design, specifying module functions, dataflows, and interface protocols. However, non-standardized symbols and the scarcity of structured training data hinder existing multimodal large language models (MLLMs) from recognizing these diagrams. To address this gap, we introduce DiagramNet, the first multimodal dataset for system-level diagrams, comprising 10,977 connection annotations and 15,515 chain-of-thought QA pairs across four tasks: Listing, Localization, Connection, and Circuit QA. Building on this dataset, we propose a progressive training pipeline together with a decoupled multi-agent workflow that decomposes complex visual reasoning into Perception, Reasoning, and Knowledge stages. On the DiagramNet benchmark, integrating our 3B-parameter model with the proposed workflow surpasses the 2025 EDA Elite Challenge winner and outperforms GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro by over 2x in end-to-end evaluation. Notably, the workflow generalizes beyond our model, boosting Task 1 performance by 128.7x for Gemini-2.5-Pro and 12.4x for GPT-5. Furthermore, with only 60 images for detector adaptation, the method transfers effectively to AMSBench, achieving zero-shot connectivity reasoning on par with GPT-5 and Claude-Sonnet-4 while surpassing the AMS state-of-the-art method Netlistify.