GRPO-TTA: Test-Time Visual Tuning for Vision-Language Models via GRPO-Driven Reinforcement Learning
arXiv cs.CV / 5/6/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces GRPO-TTA, extending Group Relative Policy Optimization (GRPO) to the test-time adaptation setting for vision-language models.
- It reformulates class-specific prompt prediction as a group-wise reinforcement learning problem by building output groups from top-K CLIP similarity candidates, allowing optimization without ground-truth labels.
- The method uses test-time adaptation-specific rewards, including alignment rewards and dispersion rewards, to steer tuning of the visual encoder.
- Experiments on multiple benchmarks show GRPO-TTA outperforms prior test-time adaptation approaches, with especially large gains under natural distribution shifts.
Related Articles

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide
Dev.to

AI as Your Contingency Co-Pilot: Automating Wedding Day 'What-Ifs'
Dev.to

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss
MarkTechPost
When Claude Hallucinates in Court: The Latham & Watkins Incident and What It Means for Attorney Liability
MarkTechPost
Solidity LM surpasses Opus
Reddit r/LocalLLaMA