AI Navigate

Noise-aware few-shot learning through bi-directional multi-view prompt alignment

arXiv cs.CV / 3/13/2026

📰 NewsModels & Research

Key Points

  • NA-MVP introduces noise-aware few-shot learning through bi-directional multi-view prompt alignment to improve robustness of vision-language models under noisy supervision.
  • The approach uses unbalanced optimal transport to achieve fine-grained patch-to-prompt correspondence and suppress unreliable regions.
  • It features a bi-directional prompt design that captures complementary clean-oriented and noise-aware cues to emphasize stable semantics.
  • An alignment-guided selective refinement strategy uses optimal transport to correct only mislabeled samples, with experiments on synthetic and real-world noisy benchmarks showing state-of-the-art improvements.

Abstract

Vision-language models offer strong few-shot capability through prompt tuning but remain vulnerable to noisy labels, which can corrupt prompts and degrade cross-modal alignment. Existing approaches struggle because they often lack the ability to model fine-grained semantic cues and to adaptively separate clean from noisy signals. To address these challenges, we propose NA-MVP, a framework for Noise-Aware few-shot learning through bi-directional Multi-View Prompt alignment. NA-MVP is built upon a key conceptual shift: robust prompt learning requires moving from global matching to region-aware alignment that explicitly distinguishes clean cues from noisy ones. To realize this, NA-MVP employs (1) multi-view prompts combined with unbalanced optimal transport to achieve fine-grained patch-to-prompt correspondence while suppressing unreliable regions; (2) a bi-directional prompt design that captures complementary clean-oriented and noise-aware cues, enabling the model to focus on stable semantics; and (3) an alignment-guided selective refinement strategy that uses optimal transport to correct only mislabeled samples while retaining reliable data. Experiments on synthetic and real-world noisy benchmarks demonstrate that NA-MVP consistently outperforms state-of-the-art baselines, confirming its effectiveness in enabling robust few-shot learning under noisy supervision.