Complementary Text-Guided Attention for Zero-Shot Adversarial Robustness
arXiv cs.CV / 3/20/2026
💬 OpinionModels & Research
Key Points
- The authors observe that adversarial perturbations induce shifts in text-guided attention in CLIP-like models, motivating robustness improvements.
- They propose Text-Guided Attention for Zero-Shot Robustness (TGA-ZSR) with a Local Attention Refinement Module and a Global Attention Constraint Module to improve robustness while preserving clean accuracy.
- They further introduce Complementary Text-Guided Attention (Comp-TGA), which combines class-prompt guided attention with reversed attention from the non-class prompt to better capture foreground details.
- Experimental results show 9.58% and 11.95% improvements in zero-shot robust accuracy for TGA-ZSR and Comp-TGA, respectively, across 16 datasets.
Related Articles

Math needs thinking time, everyday knowledge needs memory, and a new Transformer architecture aims to deliver both
THE DECODER
Kreuzberg v4.5.0: We loved Docling's model so much that we gave it a faster engine
Reddit r/LocalLLaMA
Today, what hardware to get for running large-ish local models like qwen 120b ?
Reddit r/LocalLLaMA
Running mistral locally for meeting notes and it's honestly good enough for my use case
Reddit r/LocalLLaMA
[D] Single-artist longitudinal fine art dataset spanning 5 decades now on Hugging Face — potential applications in style evolution, figure representation, and ethical training data
Reddit r/MachineLearning