A Provable Energy-Guided Test-Time Defense Boosting Adversarial Robustness of Large Vision-Language Models
arXiv cs.CV / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses the vulnerability of large vision-language models (LVLMs) to adversarial perturbations and motivates test-time transformations as an alternative to adversarial training.
- It proposes Energy-Guided Test-Time Transformation (ET3), a training-free defense that improves robustness by transforming inputs to minimize an energy criterion.
- The authors provide a theoretical justification that, under reasonable assumptions, the transformation can succeed in preserving correct classification.
- Experiments show ET3 works not only for standard classifiers and CLIP zero-shot settings, but also improves adversarial robustness for LVLM tasks like image captioning and visual question answering.
- The research is accompanied by released code, enabling replication and experimentation (github.com/OmnAI-Lab/Energy-Guided-Test-Time-Defense).
Related Articles

Black Hat Asia
AI Business
[D] How does distributed proof of work computing handle the coordination needs of neural network training?
Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside
Dev.to

BYOK is not just a pricing model: why it changes AI product trust
Dev.to

AI Citation Registries and Identity Persistence Across Records
Dev.to