Linking Perception, Confidence and Accuracy in MLLMs
arXiv cs.CL / 3/13/2026
💬 OpinionModels & Research
Key Points
- The study identifies a severe confidence miscalibration problem in multi-modal LLMs, showing that improved perception does not guarantee reliable confidence estimates.
- It proposes Confidence-Driven Reinforcement Learning (CDRL), which uses original-noise image pairs and a confidence-based reward to enhance perceptual sensitivity and calibrate model confidence.
- It further introduces Confidence-Aware Test-Time Scaling (CA-TTS), which dynamically coordinates Self-Consistency, Self-Reflection, and Visual Self-Check modules guided by confidence signals.
- An Expert Model takes on multiple roles (Planner, Critic, Voter) to schedule these modules and provide external verification, enabling robust confidence management.
- The integrated framework achieves state-of-the-art results with consistent 8.8% gains across four benchmarks, supported by ablation studies and scaling advantages.
Related Articles

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.
Dev.to

My AI Does Not Have a Clock
Dev.to
How to settle on a coding LLM ? What parameters to watch out for ?
Reddit r/LocalLLaMA

Andrej Karpathy's autonomous AI research agent ran 700 experiments in 2 days and gave a glimpse of where AI is heading
Reddit r/artificial

So cursor admits that Kimi K2.5 is the best open source model
Reddit r/LocalLLaMA