Intrinsic Gradient Suppression for Label-Noise Prompt Tuning in Vision-Language Models
arXiv cs.CV / 5/4/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- Contrastive vision-language models like CLIP can generalize well in zero-shot settings, but prompt tuning is vulnerable to label noise because mislabels produce very large gradients that can override the model’s pretrained priors.
- The paper proposes Double-Softmax Prompt Tuning (DSPT), a hyperparameter-free technique that intrinsically suppresses gradients from high-error (likely noisy) samples while preserving useful updates.
- DSPT works by applying a sequential probabilistic normalization that creates a self-adaptive “saturation zone,” effectively filtering out harmful gradient signals caused by label noise.
- The authors provide theoretical analysis and empirical results showing how DSPT achieves adaptive gradient suppression and mitigates a “gradient vanishing” bottleneck by repurposing it as noise filtering.
- Extensive experiments across multiple noisy benchmarks show DSPT is a simple drop-in design that achieves state-of-the-art robustness, outperforming more complex methods that rely on handcrafted hyperparameters.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge

CLMA Frame Test
Dev.to

You Are Right — You Don't Need CLAUDE.md
Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to