CADSmith: Multi-Agent CAD Generation with Programmatic Geometric Validation

arXiv cs.AI / 3/30/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

CADSmith is a multi-agent text-to-CAD system that generates CadQuery code from natural language and improves it via two nested refinement loops.
The outer loop uses programmatic geometric validation combining exact OpenCASCADE measurements (e.g., bounding box dimensions, volume, and solid validity) with higher-level shape assessment from a separate vision-language model (“Judge”).
The approach corrects both execution-time issues (inner loop) and geometric correctness (outer loop), aiming to eliminate dimensional errors that single-pass or purely visual methods struggle to catch.
CADSmith uses retrieval-augmented generation over up-to-date API documentation rather than fine-tuning, so it can track changes in the underlying CAD library.
On a 100-prompt benchmark with difficulty tiers and ablations, CADSmith reports a higher execution rate (100% vs 95%), improved F1/IoU, and a dramatically reduced mean Chamfer Distance versus a zero-shot baseline, indicating more reliable and accurate LLM-generated CAD outputs.

Abstract

Existing methods for text-to-CAD generation either operate in a single pass with no geometric verification or rely on lossy visual feedback that cannot resolve dimensional errors. We present CADSmith, a multi-agent pipeline that generates CadQuery code from natural language. It then undergoes an iterative refinement process through two nested correction loops: an inner loop that resolves execution errors and an outer loop grounded in programmatic geometric validation. The outer loop combines exact measurements from the OpenCASCADE kernel (bounding box dimensions, volume, solid validity) with holistic visual assessment from an independent vision-language model Judge. This provides both the numerical precision and the high-level shape awareness needed to converge on the correct geometry. The system uses retrieval-augmented generation over API documentation rather than fine-tuning, maintaining a current database as the underlying CAD library evolves. We evaluate on a custom benchmark of 100 prompts in three difficulty tiers (T1 through T3) with three ablation configurations. Against a zero-shot baseline, CADSmith achieves a 100% execution rate (up from 95%), improves the median F1 score from 0.9707 to 0.9846, the median IoU from 0.8085 to 0.9629, and reduces the mean Chamfer Distance from 28.37 to 0.74, demonstrating that closed-loop refinement with programmatic geometric feedback substantially improves the quality and reliability of LLM-generated CAD models.