AI Navigate

MIRAGE: Model-agnostic Industrial Realistic Anomaly Generation and Evaluation for Visual Anomaly Detection

arXiv cs.CV / 3/17/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • MIRAGE is a fully automated, model-agnostic pipeline that can generate realistic industrial anomalies and corresponding pixel-level masks without any training or real anomalous images.
  • It accesses any generative model as a black box via API calls, uses a vision-language model to automatically generate defect prompts, and applies a CLIP-based quality filter to retain only well-aligned outputs.
  • A lightweight, training-free dual-branch semantic change detection module combines text-conditioned Grounding DINO features with fine-grained YOLOv6-Seg features to produce masks at scale.
  • The approach is benchmarked on MVTec AD and VisA across two tasks—downstream anomaly segmentation and evaluation of generated image quality—using metrics such as IS and IC-LPIPS and a human perceptual study with 31 participants yielding 1,550 votes.
  • Additionally, the authors release a large-scale dataset totaling over 13,000 image-mask pairs across MVTec AD and VisA, along with generation prompts and the pipeline code, to support anomaly-aware industrial inspection without real defect data.

Abstract

Industrial visual anomaly detection (VAD) methods are typically trained on normal samples only, yet performance improves substantially when even limited anomalous data is available. Existing anomaly generation approaches either require real anomalous examples, demand expensive hardware, or produce synthetic defects that lack realism. We present MIRAGE (Model-agnostic Industrial Realistic Anomaly Generation and Evaluation), a fully automated pipeline for realistic anomalous image generation and pixel-level mask creation that requires no training and no anomalous images. Our pipeline accesses any generative model as a black box via API calls, uses a VLM for automatic defect prompt generation, and includes a CLIP-based quality filter to retain only well-aligned generated images. For mask generation at scale, we introduce a lightweight, training-free dual-branch semantic change detection module combining text-conditioned Grounding DINO features with fine-grained YOLOv26-Seg structural features. We benchmark four generation methods using Gemini 2.5 Flash Image (Nano Banana) as the generative backbone, evaluating performance on MVTec AD and VisA across two distinct tasks: (i) downstream anomaly segmentation and (ii) visual quality of the generated images, assessed via standard metrics (IS, IC-LPIPS) and a human perceptual study involving 31 participants and 1,550 pairwise votes. The results demonstrate that MIRAGE offers a scalable, accessible foundation for anomaly-aware industrial inspection that requires no real defect data. As a final contribution, we publicly release a large-scale dataset comprising 500 image-mask pairs per category for every MVTec AD and VisA class, over 13,000 pairs in total, alongside all generation prompts and pipeline code.