BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder
arXiv cs.CV / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces BackdoorIDS, a zero-shot, inference-time method to detect backdoors in pretrained vision encoders without requiring retraining.
- It relies on the concepts of Attention Hijacking and Restoration, using progressive input masking to observe how attention and embeddings shift as the trigger is masked.
- BackdoorIDS builds an embedding sequence along the masking trajectory and uses density-based clustering (e.g., DBSCAN) to determine if an input is backdoored, flagging those whose embeddings form more than one cluster.
- The method is plug-and-play and compatible with a wide range of encoder architectures (CNNs, ViTs, CLIP, LLaVA-1.5) and reportedly outperforms existing defenses across various attack types and datasets.
- It operates fully zero-shot at inference time, enabling broad, practical deployment without any model retraining or provenance guarantees for third-party encoders.
Related Articles
Is AI becoming a bubble, and could it end like the dot-com crash?
Reddit r/artificial

Externalizing State
Dev.to

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.
Dev.to

My AI Does Not Have a Clock
Dev.to
How to settle on a coding LLM ? What parameters to watch out for ?
Reddit r/LocalLLaMA