Test-Time Attention Purification for Backdoored Large Vision Language Models
arXiv cs.CV / 3/16/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper analyzes backdoor attacks in large vision-language models and finds that triggers influence predictions by redistributing cross-modal attention, a phenomenon they call attention stealing.
- It introduces CleanSight, a training-free, plug-and-play defense that operates at test time by detecting poisoned inputs via the relative visual-text attention ratio in cross-modal fusion layers and purifying inputs by pruning high-attention visual tokens.
- CleanSight is designed to be training-free and to preserve model utility on both clean and poisoned data, outperforming existing pixel-based purification defenses.
- The work provides extensive experiments across diverse datasets and backdoor attack types, demonstrating the method’s robustness and practical effectiveness.
Related Articles
How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command
Dev.to
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dev.to
What CVE-2026-25253 Taught Me About Building Safe AI Assistants
Dev.to
Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users
Dev.to
The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX
Dev.to