Intrinsic Gradient Suppression for Label-Noise Prompt Tuning in Vision-Language Models

arXiv cs.CV / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Contrastive vision-language models like CLIP can generalize well in zero-shot settings, but prompt tuning is vulnerable to label noise because mislabels produce very large gradients that can override the model’s pretrained priors.
The paper proposes Double-Softmax Prompt Tuning (DSPT), a hyperparameter-free technique that intrinsically suppresses gradients from high-error (likely noisy) samples while preserving useful updates.
DSPT works by applying a sequential probabilistic normalization that creates a self-adaptive “saturation zone,” effectively filtering out harmful gradient signals caused by label noise.
The authors provide theoretical analysis and empirical results showing how DSPT achieves adaptive gradient suppression and mitigates a “gradient vanishing” bottleneck by repurposing it as noise filtering.
Extensive experiments across multiple noisy benchmarks show DSPT is a simple drop-in design that achieves state-of-the-art robustness, outperforming more complex methods that rely on handcrafted hyperparameters.

Abstract

Contrastive vision-language models like CLIP exhibit remarkable zero-shot generalization. However, prompt tuning remains highly sensitive to label noise, as mislabeled samples generate disproportionately large gradients that can overwhelm pre-trained priors. We argue that because CLIP already provides a near-optimal initialization, adaptation should be inherently conservative, particularly against the extreme gradient updates common in noisy settings. To this end, we propose Double-Softmax Prompt Tuning (DSPT), a hyperparameter-free method for intrinsic gradient suppression. By applying a sequential probabilistic normalization, DSPT induces a self-adaptive saturation zone that suppresses gradients from high-error noisy samples while maintaining informative updates. We also provide both theoretical analysis and empirical evidence about how this mechanism achieves adaptive suppression. This design transforms ``gradient vanishing'', traditionally a training bottleneck, into a principled noise-filtering shield for label-noise prompt tuning. Extensive experiments confirm that this simple, drop-in design achieves state-of-the-art robustness across various noisy benchmarks, outperforming methods with complex architectures and handcrafted hyperparameters.

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

The Verge

CLMA Frame Test

Dev.to

You Are Right — You Don't Need CLAUDE.md

Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Dev.to

Intrinsic Gradient Suppression for Label-Noise Prompt Tuning in Vision-Language Models

Key Points

Abstract

Related Articles

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

CLMA Frame Test

You Are Right — You Don't Need CLAUDE.md

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer