GIFT: Bootstrapping Image-to-CAD Program Synthesis via Geometric Feedback

arXiv cs.LG / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies a core bottleneck in image-to-CAD program synthesis as insufficient training data that reliably aligns visual geometry with symbolic program syntax as design complexity grows.
  • It proposes Geometric Inference Feedback Tuning (GIFT), a data augmentation framework that uses geometric feedback to bootstrap additional high-quality training examples from test-time predictions.
  • GIFT includes two techniques—Soft-Rejection Sampling to keep diverse high-fidelity programs and Failure-Driven Augmentation to turn near-miss outputs into synthetic training samples for harder geometries.
  • The method amortizes inference-time search into model parameters, yielding an ~80% reduction in inference compute while improving mean IoU by 12% versus a strong supervised baseline.
  • The authors report competitive performance relative to more complex multimodal systems, without adding human annotation or requiring specialized model architectures.

Abstract

Generating executable CAD programs from images requires alignment between visual geometry and symbolic program representations, a capability that current methods fail to learn reliably as design complexity increases. Existing fine-tuning approaches rely on either limited supervised datasets or expensive post-training pipelines, resulting in brittle systems that restrict progress in generative CAD design. We argue that the primary bottleneck lies not in model or algorithmic capacity, but in the scarcity of diverse training examples that align visual geometry with program syntax. This limitation is especially acute because the collection of diverse and verified engineering datasets is both expensive and difficult to scale, constraining the development of robust generative CAD models. We introduce Geometric Inference Feedback Tuning (GIFT), a data augmentation framework that leverages geometric feedback to turn test-time compute into a bootstrapped set of high-quality training samples. GIFT combines two mechanisms: Soft-Rejection Sampling (GIFT-REJECT), which retains diverse high-fidelity programs beyond exact ground-truth matches, and Failure-Driven Augmentation (GIFT-FAIL), which converts near-miss predictions into synthetic training examples that improve robustness on challenging geometries. By amortizing inference-time search into the model parameters, GIFT captures the benefits of test-time scaling while reducing inference compute by 80%. It improves mean IoU by 12% over a strong supervised baseline and remains competitive with more complex multimodal systems, without requiring additional human annotation or specialized architectures.