BackdoorIDS: Zero-shot Backdoor Detection for Pretrained Vision Encoder
arXiv cs.CV / 3/13/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces BackdoorIDS, a zero-shot, inference-time method to detect backdoors in pretrained vision encoders without requiring retraining.
- It relies on the concepts of Attention Hijacking and Restoration, using progressive input masking to observe how attention and embeddings shift as the trigger is masked.
- BackdoorIDS builds an embedding sequence along the masking trajectory and uses density-based clustering (e.g., DBSCAN) to determine if an input is backdoored, flagging those whose embeddings form more than one cluster.
- The method is plug-and-play and compatible with a wide range of encoder architectures (CNNs, ViTs, CLIP, LLaVA-1.5) and reportedly outperforms existing defenses across various attack types and datasets.
- It operates fully zero-shot at inference time, enabling broad, practical deployment without any model retraining or provenance guarantees for third-party encoders.
Related Articles

The programming passion is melting
Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations
Dev.to
Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders
Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)
Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more
Reddit r/LocalLLaMA