Improving Calibration in Test-Time Prompt Tuning for Vision-Language Models via Data-Free Flatness-Aware Prompt Pretraining
arXiv cs.CV / 5/1/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- Test-time prompt tuning (TPT) can improve vision-language models using unlabeled test data, but it often yields poorly calibrated (less reliable) predictions.
- The study finds that common calibration-improving regularization approaches tend to steer optimization toward flatter loss minima, and that loss-landscape sharpness around adapted prompts strongly affects calibration quality.
- It introduces Flatness-aware Prompt Pretraining (FPP), which pretrains/initializes prompts in flatter regions before performing standard TPT adaptation.
- The authors report that swapping only the prompt initialization in existing TPT pipelines can improve both calibration and performance without changing other components.
- FPP is data-free (requires no labeled data) and adds no extra test-time computational cost, with code released on GitHub.
Related Articles

Black Hat USA
AI Business

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Why Enterprise AI Pilots Fail
Dev.to

Announcing the NVIDIA Nemotron 3 Super Build Contest
Dev.to