SynthPID: P&ID digitization from Topology-Preserving Synthetic Data
arXiv cs.CV / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper tackles a bottleneck in P&ID digitization (turning P&IDs into structured process graphs) caused by proprietary engineering drawings and a public benchmark limited to 12 annotated images.
- Prior template-based synthetic augmentation performed poorly because it randomly scatters symbols, producing unrealistic graph topology and yielding only about 33% edge detection accuracy with synth-only training.
- SynthPID introduces a topology-preserving synthetic dataset by seeding pipe connectivity directly from real drawings, enabling training without using any real P&IDs.
- Using a patch-based Relationformer adapted for high-resolution diagrams, training on SynthPID alone reaches 63.8 ± 3.1% edge mAP on PID2Graph OPEN100, within 8 percentage points of a real-data oracle.
- A controlled comparison against template-based generation confirms that synthetic generation quality—not model architecture choice—is the key driver, and a scaling study suggests improvements level off beyond ~400 synthetic images due to seed diversity constraints.
Related Articles

Rethinking Coding Education for the AI Era
Dev.to

We Shipped an MVP With Vibe-Coding. Here's What Nobody Tells You About the Aftermath
Dev.to

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents
Dev.to

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work
Dev.to

Open Source Contributors Needed for Skillware & Rooms (AI/ML/Python)
Dev.to