Large-Scale Universal Defect Generation: Foundation Models and Datasets

arXiv cs.CV / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces UDG, a large-scale dataset of 300K normal/abnormal/mask/caption quadruplets across diverse domains to overcome limited paired defect editing data in prior few-shot methods.
It presents UniDG, a universal foundation model for defect generation that supports both reference-based generation and text instruction-based defect editing without per-category fine-tuning.
UniDG uses Defect-Context Editing with adaptive defect cropping and a structured “diptych” input format, and it fuses reference and target conditions via MM-DiT multimodal attention.
A two-stage training approach (Diversity-SFT followed by Consistency-RFT) is used to improve diversity while also boosting realism and consistency with reference conditions.
Experiments on MVTec-AD and VisA indicate UniDG outperforms existing few-shot anomaly generation and image insertion/editing baselines and improves downstream anomaly detection/localization.
The authors plan to release code at the provided GitHub repository.

Abstract

Existing defect/anomaly generation methods often rely on few-shot learning, which overfits to specific defect categories due to the lack of large-scale paired defect editing data. This issue is aggravated by substantial variations in defect scale and morphology, resulting in limited generalization, degraded realism, and category consistency. We address these challenges by introducing UDG, a large-scale dataset of 300K normal-abnormal-mask-caption quadruplets spanning diverse domains, and by presenting UniDG, a universal defect generation foundation model that supports both reference-based defect generation and text instruction-based defect editing without per-category fine-tuning. UniDG performs Defect-Context Editing via adaptive defect cropping and structured diptych input format, and fuses reference and target conditions through MM-DiT multimodal attention. A two-stage training strategy, Diversity-SFT followed by Consistency-RFT, further improves diversity while enhancing realism and reference consistency. Extensive experiments on MVTec-AD and VisA show that UniDG outperforms prior few-shot anomaly generation and image insertion/editing baselines in synthesis quality and downstream single- and multi-class anomaly detection/localization. Code will be available at https://github.com/RetoFan233/UniDG.

Black Hat Asia

AI Business

Apple is building smart glasses without a display to serve as an AI wearable

THE DECODER

Why Fashion Trend Prediction Isn’t Enough Without Generative AI

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Chatbot vs Voicebot: The Real Business Decision Nobody Talks About

Dev.to

Large-Scale Universal Defect Generation: Foundation Models and Datasets

Key Points

Abstract

Related Articles

Black Hat Asia

Apple is building smart glasses without a display to serve as an AI wearable

Why Fashion Trend Prediction Isn’t Enough Without Generative AI

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Chatbot vs Voicebot: The Real Business Decision Nobody Talks About

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer