FB-CLIP: Fine-Grained Zero-Shot Anomaly Detection with Foreground-Background Disentanglement
arXiv cs.CV / 3/23/2026
💬 OpinionModels & Research
Key Points
- FB-CLIP introduces foreground-background disentanglement to enable fine-grained zero-shot anomaly detection and localization, reducing interference from backgrounds.
- It enhances textual cues via End-of-Text features, global-pooled representations, and attention-weighted token features for richer semantic guidance.
- The visual module applies multi-view soft separation along identity, semantic, and spatial dimensions with background suppression to improve discriminability.
- Semantic Consistency Regularization aligns image features with normal and abnormal textual prototypes to enlarge semantic gaps and suppress uncertain matches.
- Experiments show effective anomaly detection and localization under zero-shot settings in complex scenes.
Related Articles
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dev.to
The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX
Dev.to
[P] Prompt optimization for analog circuit placement — 97% of expert quality, zero training data
Reddit r/MachineLearning
[R] Looking for arXiv endorser (cs.AI or cs.LG)
Reddit r/MachineLearning

I curated an 'Awesome List' for Generative AI in Jewelry- papers, datasets, open-source models and tools included!
Reddit r/artificial