AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

arXiv cs.CV / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses a problem where pre-trained vision-language models (VLMs) are vulnerable to adversarial perturbations despite strong zero-shot performance.
It argues that existing label-based adversarial fine-tuning can break the models’ cross-modal alignment, harming image-text correspondence and reducing zero-shot accuracy.
It introduces Alignment-Guided Fine-Tuning (AGFT), which uses the original model’s probabilistic (soft) predictions to guide adversarial training while preserving the relative structure between visual features and textual embeddings.
To mitigate fine-tuning-induced structural shifts, AGFT adds a distribution consistency calibration step that aligns the robust model’s outputs with a temperature-scaled version of the pre-trained model.
Experiments across multiple zero-shot benchmarks show AGFT outperforms prior state-of-the-art approaches, yielding stronger zero-shot adversarial robustness without sacrificing cross-modal semantics.

Abstract

Pre-trained vision-language models (VLMs) exhibit strong zero-shot generalization but remain vulnerable to adversarial perturbations. Existing classification-guided adversarial fine-tuning methods often disrupt pre-trained cross-modal alignment, weakening visual-textual correspondence and degrading zero-shot performance. In this paper, we propose an Alignment-Guided Fine-Tuning (AGFT) framework that enhances zero-shot adversarial robustness while preserving the cross-modal semantic structure. Unlike label-based methods that rely on hard labels and fail to maintain the relative relationships between image and text, AGFT leverages the probabilistic predictions of the original model for text-guided adversarial training, which aligns adversarial visual features with textual embeddings via soft alignment distributions, improving zero-shot adversarial robustness. To address structural discrepancies introduced by fine-tuning, we introduce a distribution consistency calibration mechanism that adjusts the robust model output to match a temperature-scaled version of the pre-trained model predictions. Extensive experiments across multiple zero-shot benchmarks demonstrate that AGFT outperforms state-of-the-art methods while significantly improving zero-shot adversarial robustness.

Knowledge Governance For The Agentic Economy.

Dev.to

AI server farms heat up the neighborhood for miles around, paper finds

The Register

Does the Claude “leak” actually change anything in practice?

Reddit r/LocalLLaMA

87.4% of My Agent's Decisions Run on a 0.8B Model

Dev.to

AIエージェントをソフトウェアチームに変える無料ツール「Paperclip」

Dev.to

AGFT: Alignment-Guided Fine-Tuning for Zero-Shot Adversarial Robustness of Vision-Language Models

Key Points

Abstract

Related Articles

Knowledge Governance For The Agentic Economy.

AI server farms heat up the neighborhood for miles around, paper finds

Does the Claude “leak” actually change anything in practice?

87.4% of My Agent's Decisions Run on a 0.8B Model

AIエージェントをソフトウェアチームに変える無料ツール「Paperclip」

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer