AI Navigate

A2Z-10M+: Geometric Deep Learning with A-to-Z BRep Annotations for AI-Assisted CAD Modeling and Reverse Engineering

arXiv cs.CV / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The A2Z-10M+ dataset compiles 10 million multi-modal annotations for 1 million ABC CAD models, enabling BRep-aware learning for AI-assisted CAD modeling and reverse engineering.
  • It includes high-resolution meshes, 3D hand-drawn sketches with BRep co-edges, corners, surfaces, and textual captions describing the parts.
  • The dataset requires roughly 5 terabytes of storage and uses novel metrics and human feedback (GPT-5, Gemini) to assess scale and quality.
  • An additional 25,000 professionally designed electronic enclosure CAD models are merged into the dataset to broaden real-world coverage.
  • A foundation model was trained and benchmarked on a subset of 150,000 CAD models to detect BRep co-edges and corner vertices from 3D scans, showcasing a key downstream task.

Abstract

Reverse engineering and rapid prototyping of computer-aided design (CAD) models from 3D scans, sketches, or simple text prompts are vital in industrial product design. However, recent advances in geometric deep learning techniques lack a multi-modal understanding of parametric CAD features stored in their boundary representation (BRep). This study presents the largest compilation of 10 million multi-modal annotations and metadata for 1 million ABC CAD models, namely A2Z, to unlock an unprecedented level of BRep learning. A2Z comprises (i) high-resolution meshes with salient 3D scanning features, (ii) 3D hand-drawn sketches equipped with (iii) geometric and topological information about BRep co-edges, corners, and surfaces, and (iv) textual captions and tags describing the product in the mechanical world. Creating such carefully structured, large-scale data, which requires nearly 5 terabytes of storage to leverage unparalleled CAD learning/retrieval tasks, is very challenging. The scale, quality, and diversity of our multi-modal annotations are assessed using novel metrics, GPT-5, Gemini, and extensive human feedback mechanisms. To this end, we also merge an additional 25,000 CAD models of electronic enclosures (e.g., tablets, ports) designed by skilled professionals with our A2Z dataset. Subsequently, we train and benchmark a foundation model on a subset of 150K CAD models to detect BRep co-edges and corner vertices from 3D scans, a key downstream task in CAD reverse engineering. The annotated dataset, metrics, and checkpoints will be publicly released to support numerous research directions.