Industrial Surface Defect Detection via Diffusion Generation and Asymmetric Student-Teacher Network

arXiv cs.AI / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles industrial surface defect detection challenges—few defect samples, long-tailed defect distributions, and hard-to-localize subtle defects—by proposing an unsupervised approach.
  • It trains a DDPM only on normal (defect-free) samples, then uses Gaussian perturbations and Perlin-noise masks to generate realistic, physically consistent synthetic defect-like samples with pixel-level annotations.
  • The method uses an asymmetric teacher–student dual-stream network, where the teacher provides stable representations of normal features and the student reconstructs normal patterns to highlight discrepancies in anomalous regions.
  • A joint training objective combines cosine similarity loss with pixel-wise segmentation supervision to improve precise localization.
  • On the MVTecAD benchmark, the approach reports 98.4% image-level AUROC and 98.3% pixel-level AUROC, outperforming prior unsupervised and mainstream deep learning methods without requiring large amounts of real defect data.

Abstract

Industrial surface defect detection often suffers from limited defect samples, severe long-tailed distributions, and difficulties in accurately localizing subtle defects under complex backgrounds. To address these challenges, this paper proposes an unsupervised defect detection method that integrates a Denoising Diffusion Probabilistic Model (DDPM) with an asymmetric teacher-student architecture. First, at the data level, the DDPM is trained solely on normal samples. By introducing constant-variance Gaussian perturbations and Perlin noise-based masks, high-fidelity and physically consistent defect samples along with pixel-level annotations are generated, effectively alleviating the data scarcity problem. Second, at the model level, an asymmetric dual-stream network is constructed. The teacher network provides stable representations of normal features, while the student network reconstructs normal patterns and amplifies discrepancies between normal and anomalous regions. Finally, a joint optimization strategy combining cosine similarity loss and pixel-wise segmentation supervision is adopted to achieve precise localization of subtle defects. Experimental results on the MVTecAD dataset show that the proposed method achieves 98.4\% image-level AUROC and 98.3\% pixel-level AUROC, significantly outperforming existing unsupervised and mainstream deep learning methods. The proposed approach does not require large amounts of real defect samples and enables accurate and robust industrial defect detection and localization. \keywords{Industrial defect detection \and diffusion models \and data generation \and teacher-student architecture \and pixel-level localization}