Feature Perturbation Pool-based Fusion Network for Unified Multi-Class Industrial Defect Detection

arXiv cs.CV / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces FPFNet, a unified multi-class industrial defect detection model aimed at avoiding the need to train separate networks per defect category.
  • FPFNet uses a stochastic feature perturbation pool that injects diverse noise patterns (Gaussian noise, F-Noise, and F-Drop) into feature representations to improve robustness to domain shifts and previously unseen defect morphologies.
  • A multi-layer feature fusion module with residual connections and normalization combines hierarchical features from both encoder and decoder to capture cross-scale relationships while preserving spatial detail for accurate defect localization.
  • Built on the UniAD architecture, the approach reports state-of-the-art results on MVTec-AD and VisA with improvements in both image-level and pixel-level AUROC, while adding no extra learnable parameters or computational complexity.
  • Experiments suggest the method mitigates degraded performance commonly caused by inter-class feature perturbation when multiple defect categories are modeled together.

Abstract

Multi-class defect detection constitutes a critical yet challenging task in industrial quality inspection, where existing approaches typically suffer from two fundamental limitations: (i) the necessity of training separate models for each defect category, resulting in substantial computational and memory overhead, and (ii) degraded robustness caused by inter-class feature perturbation when heterogeneous defect categories are jointly modeled. In this paper, we present FPFNet, a Feature Perturbation Pool-based Fusion Network that synergistically integrates a stochastic feature perturbation pool with a multi-layer feature fusion strategy to address these challenges within a unified detection framework. The feature perturbation pool enriches the training distribution by randomly injecting diverse noise patterns -- including Gaussian noise, F-Noise, and F-Drop -- into the extracted feature representations, thereby strengthening the model's robustness against domain shifts and unseen defect morphologies. Concurrently, the multi-layer feature fusion module aggregates hierarchical feature representations from both the encoder and decoder through residual connections and normalization, enabling the network to capture complex cross-scale relationships while preserving fine-grained spatial details essential for precise defect localization. Built upon the UniAD architecture~\cite{you2022unified}, our method achieves state-of-the-art performance on two widely adopted benchmarks: 97.17\% image-level AUROC and 96.93\% pixel-level AUROC on MVTec-AD, and 91.08\% image-level AUROC and 99.08\% pixel-level AUROC on VisA, surpassing existing methods by notable margins while introducing no additional learnable parameters or computational complexity.