Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics

arXiv cs.CV / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

本論文は、既存のパラメータ効率的なプロンプト学習が主に「一次（first-order）な空間特徴の整合」に依存している点を問題視し、ドメインシフトや局所ノイズに弱いと指摘しています。
提案手法Gram-Anchored Prompt Learning (GAPL) は、一次の空間的相互作用に加えてGram行列による「二次（second-order）統計ストリーム」を導入し、言語表現の適応を二次統計の事前（prior）にアンカーします。
ローカルなセマンティック整合とグローバルな構造的一貫性を同時に狙うことで、統計分布の変化に対してプロンプトが動的に適応できるとしています。
広範な実験により、二次特徴が有効であること、かつ複数ベンチマークでGAPLが良好な性能を示すことが報告されています。

Abstract

Parameter-efficient prompt learning has become the de facto standard for adapting Vision-Language Models (VLMs) to downstream tasks. Existing approaches predominantly focus on aligning text prompts with first-order visual features (i.e., spatial feature maps). While effective for fine-grained semantic discrimination, we argue that relying solely on first-order information is insufficient for robust adaptation, as these spatially entangled features are highly susceptible to domain shifts and local noise. In this work, we propose \textbf{Gram-Anchored Prompt Learning (GAPL)} for Vision-Language Models via Second-Order Statistics, a framework that synergizes local semantic alignment with global structural consistency. Methodologically, we introduce an additional second-order statistical stream via \textbf{Gram matrices} that augments the standard first-order spatial interaction. By anchoring prompts to these second-order priors, our approach enables language representations to dynamically adapt to statistical distribution shifts across diverse domains. Extensive experiments indicate the effectiveness of the second-order features, and show compelling performances of GAPL on various benchmarks.

CIA is trusting AI to help analyze intel from human spies

Reddit r/artificial

LLM API Pricing in 2026: I Put Every Major Model in One Table

Dev.to

i generated AI video on a GTX 1660. here's what it actually takes.

Dev.to

Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios

Dev.to

How To Optimize Enterprise AI Energy Consumption

Dev.to

Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics

Key Points

Abstract

Related Articles

CIA is trusting AI to help analyze intel from human spies

LLM API Pricing in 2026: I Put Every Major Model in One Table

i generated AI video on a GTX 1660. here's what it actually takes.

Meta-Optimized Continual Adaptation for planetary geology survey missions for extreme data sparsity scenarios

How To Optimize Enterprise AI Energy Consumption

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer