Backdoor Directions in Vision Transformers
arXiv cs.CV / 3/12/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies a specific 'trigger direction' in Vision Transformer activations that encodes the internal representation of a backdoor when a trigger is present.
- It demonstrates the causal role of this direction by showing that interventions in both activation and parameter space consistently modulate backdoor behavior across multiple datasets and attack types.
- The trigger direction is used as a diagnostic tool to trace how backdoor features are processed across layers, revealing distinct logic for static-patch versus stealthy distributed triggers.
- The study examines the link between backdoors and adversarial attacks, testing whether PGD-based perturbations can (de-)activate the identified trigger mechanism.
- It proposes a data-free, weight-based detection scheme for stealthy-trigger attacks, illustrating how mechanistic interpretability can diagnose and address security vulnerabilities in computer vision.
Related Articles

Astral to Join OpenAI
Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA

Why Data is Important for LLM
Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.
Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever
Dev.to