STRAP-ViT: Segregated Tokens with Randomized -- Transformations for Defense against Adversarial Patches in ViTs
arXiv cs.CV / 3/16/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- STRAP-ViT proposes a non-trainable, plug-and-play defense for Vision Transformers that detects anomalous tokens using Jensen-Shannon Divergence during a Detection Phase.
- It then applies randomized composite transformations to the segregated tokens in a Mitigation Phase to neutralize adversarial patches without requiring additional training.
- The defense uses a hyper-parameter to require that at least 50% of the patch is covered by transformed tokens, balancing robustness and efficiency.
- Experiments on ViT-base-16 and DinoV2 across ImageNet and CalTech-101 against multiple attacks show robust accuracy within 2-3% of clean baselines and outperform the state of the art.




