Dual-Modality Anchor-Guided Filtering for Test-time Prompt Tuning
arXiv cs.CV / 4/15/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes a dual-modality anchor-guided filtering method for test-time prompt tuning in vision-language models, aiming to select informative augmented views more reliably than entropy-only approaches.
- It introduces a text anchor using attribute-rich class descriptions for fine-grained semantic grounding, alongside an adaptive image anchor that reflects evolving test-time statistics.
- View filtering is performed using alignment with the anchors and confidence measures, specifically to avoid miscalibration issues under distribution shift that cause models to overvalue irrelevant crops/background.
- The anchors are also used as auxiliary predictive heads, and their outputs are ensembled with confidence weighting to provide a more stable supervision signal for updating prompts.
- Experiments across 15 benchmark datasets show state-of-the-art performance, suggesting anchor-guided supervision improves the robustness of prompt updates.




