IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation
arXiv cs.CV / 4/15/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- IAD-Unify is a proposed dual-encoder unified vision-language framework that jointly supports industrial anomaly segmentation, region-grounded natural-language understanding, and controlled defect edit generation within one architecture and evaluation setup.
- The method uses a frozen DINOv2-based region expert to provide precise anomaly evidence to a shared Qwen3.5-4B vision-language backbone via lightweight token injection, enabling mask-guided generation.
- To standardize comparison across tasks, the authors introduce Anomaly-56K, a unified multi-task evaluation platform with 59,916 images spanning 24 categories and 104 defect variants.
- Experiments show that region grounding is critical for understanding (removing it drops location accuracy by over 76 percentage points) and that region-grounded generation improves full-image fidelity and masked-region perceptual quality.
- IAD-Unify also demonstrates strong performance on the MMAD benchmark, including generalization to categories unseen during training, suggesting robust cross-category transfer.
Related Articles

Black Hat Asia
AI Business
Are gamers being used as free labeling labor? The rise of "Simulators" that look like AI training grounds [D]
Reddit r/MachineLearning

I built a trading intelligence MCP server in 2 days — here's how
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
Qwen3.5-35B running well on RTX4060 Ti 16GB at 60 tok/s
Reddit r/LocalLLaMA