AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models
arXiv cs.CV / 4/1/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses a problem where pre-trained vision-language models (VLMs) are vulnerable to adversarial perturbations despite strong zero-shot performance.
- It argues that existing label-based adversarial fine-tuning can break the models’ cross-modal alignment, harming image-text correspondence and reducing zero-shot accuracy.
- It introduces Alignment-Guided Fine-Tuning (AGFT), which uses the original model’s probabilistic (soft) predictions to guide adversarial training while preserving the relative structure between visual features and textual embeddings.
- To mitigate fine-tuning-induced structural shifts, AGFT adds a distribution consistency calibration step that aligns the robust model’s outputs with a temperature-scaled version of the pre-trained model.
- Experiments across multiple zero-shot benchmarks show AGFT outperforms prior state-of-the-art approaches, yielding stronger zero-shot adversarial robustness without sacrificing cross-modal semantics.
Related Articles

Knowledge Governance For The Agentic Economy.
Dev.to

AI server farms heat up the neighborhood for miles around, paper finds
The Register
Does the Claude “leak” actually change anything in practice?
Reddit r/LocalLLaMA

87.4% of My Agent's Decisions Run on a 0.8B Model
Dev.to

AIエージェントをソフトウェアチームに変える無料ツール「Paperclip」
Dev.to