Efficient Adversarial Training via Criticality-Aware Fine-Tuning

arXiv cs.CV / 4/15/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Vision Transformer (ViT) models scale well for standard performance, but adversarial robustness does not improve proportionally as model size increases.
  • The paper proposes Criticality-Aware Adversarial Training (CAAT), which fine-tunes only a small subset of parameters by identifying which weights are most critical for adversarial robustness.
  • CAAT uses parameter-efficient fine-tuning (PEFT) to robustly adjust selected modules or weight matrices when the number of critical parameters passes a threshold, reducing training cost versus full-model adversarial training.
  • Experiments on three adversarial learning datasets show CAAT generalizes to larger ViT architectures and achieves robustness close to standard adversarial training, with only a 4.3% drop while tuning about 6% of parameters.
  • The results indicate CAAT can outperform existing lightweight adversarial training approaches that train fewer parameters, suggesting a path toward adversarial training at scale.

Abstract

Vision Transformer (ViT) models have achieved remarkable performance across various vision tasks, with scalability being a key advantage when applied to large datasets. This scalability enables ViT models to exhibit strong generalization capabilities. However, as the number of parameters increases, the robustness of ViT models to adversarial examples does not scale proportionally. Adversarial training (AT), one of the most effective methods for enhancing robustness, typically requires fine-tuning the entire model, leading to prohibitively high computational costs, especially for large ViT architectures. In this paper, we aim to robustly fine-tune only a small subset of parameters to achieve robustness comparable to standard AT. To accomplish this, we introduce Criticality-Aware Adversarial Training (CAAT), a novel method that adaptively allocates resources to the most robustness-critical parameters, fine-tuning only selected modules. Specifically, CAAT efficiently identifies parameters that contribute most to adversarial robustness. It then leverages parameter-efficient fine-tuning (PEFT) to robustly adjust weight matrices where the number of critical parameters exceeds a predefined threshold. CAAT exhibits favorable generalization when scaled to larger vision transformer architectures, potentially paving the way for adversarial training at scale, e.g, compared with plain adversarial training, CAAT incurs only a 4.3% decrease in adversarial robustness while tuning approximately 6% of its parameters. Extensive experiments on three widely used adversarial learning datasets demonstrate that CAAT outperforms state-of-the-art lightweight AT methods with fewer trainable parameters.