DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level Diagrams

arXiv cs.AI / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

DiagramNet introduces the first multimodal dataset tailored to system-level chip architecture diagrams, addressing the lack of standardized symbols and structured training data.
The dataset includes 10,977 connection annotations and 15,515 chain-of-thought QA pairs spanning four tasks: Listing, Localization, Connection, and Circuit QA.
The authors propose a progressive training pipeline and a decoupled multi-agent workflow that splits visual understanding into Perception, Reasoning, and Knowledge stages.
On the DiagramNet benchmark, a 3B-parameter model with the workflow reportedly outperforms the 2025 EDA Elite Challenge winner and exceeds GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro by more than 2x in end-to-end evaluation.
The workflow is claimed to generalize across models, yielding large Task 1 gains (e.g., 128.7x with Gemini-2.5-Pro) and enabling effective transfer to AMSBench with only 60 adaptation images, outperforming Netlistify in connectivity reasoning.

Abstract

System-level diagrams encode the architectural blueprint of chip design, specifying module functions, dataflows, and interface protocols. However, non-standardized symbols and the scarcity of structured training data hinder existing multimodal large language models (MLLMs) from recognizing these diagrams. To address this gap, we introduce DiagramNet, the first multimodal dataset for system-level diagrams, comprising 10,977 connection annotations and 15,515 chain-of-thought QA pairs across four tasks: Listing, Localization, Connection, and Circuit QA. Building on this dataset, we propose a progressive training pipeline together with a decoupled multi-agent workflow that decomposes complex visual reasoning into Perception, Reasoning, and Knowledge stages. On the DiagramNet benchmark, integrating our 3B-parameter model with the proposed workflow surpasses the 2025 EDA Elite Challenge winner and outperforms GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro by over 2x in end-to-end evaluation. Notably, the workflow generalizes beyond our model, boosting Task 1 performance by 128.7x for Gemini-2.5-Pro and 12.4x for GPT-5. Furthermore, with only 60 images for detector adaptation, the method transfers effectively to AMSBench, achieving zero-shot connectivity reasoning on par with GPT-5 and Claude-Sonnet-4 while surpassing the AMS state-of-the-art method Netlistify.

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

Dev.to

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny

Dev.to

The Refund Buried in Export Paperwork: Why Customs Drawback Claim Assembly Fits an Agent Better Than Another Research Bo

Dev.to

Gemini File Generation Guide: How to Create PDFs, Word Docs & Excel Files with AI (2026)

Dev.to

v1.83.14-stable.patch.2

LiteLLM Releases

DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level Diagrams

Key Points

Abstract

Related Articles

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny

The Refund Buried in Export Paperwork: Why Customs Drawback Claim Assembly Fits an Agent Better Than Another Research Bo

Gemini File Generation Guide: How to Create PDFs, Word Docs & Excel Files with AI (2026)

v1.83.14-stable.patch.2

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer