AI Navigate

The Cost of Reasoning: Chain-of-Thought Induces Overconfidence in Vision-Language Models

arXiv cs.LG / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The authors show that extended reasoning through chain-of-thought prompting in vision-language models reduces the reliability of uncertainty estimates, even if it improves task accuracy.
  • The primary mechanism is implicit answer conditioning: as reasoning traces converge on a conclusion, token probabilities reflect consistency with the model's own reasoning rather than true uncertainty about correctness, leading to overconfidence.
  • In contrast, agreement-based consistency remains robust under reasoning and often improves, making it a practical uncertainty estimator in reasoning-enabled VLMs.
  • These findings have important implications for deploying VLMs in high-stakes settings and for designing reliable uncertainty quantification methods in such systems.

Abstract

Vision-language models (VLMs) are increasingly deployed in high-stakes settings where reliable uncertainty quantification (UQ) is as important as predictive accuracy. Extended reasoning via chain-of-thought (CoT) prompting or reasoning-trained models has become ubiquitous in modern VLM pipelines, yet its effect on UQ reliability remains poorly understood. We show that reasoning consistently degrades the quality of most uncertainty estimates, even when it improves task accuracy. We identify implicit answer conditioning as the primary mechanism: as reasoning traces converge on a conclusion before the final answer is generated, token probabilities increasingly reflect consistency with the model's own reasoning trace rather than uncertainty about correctness. In effect, the model becomes overconfident in its answer. In contrast, agreement-based consistency remains robust and often improves under reasoning, making it a practical choice for uncertainty estimation in reasoning-enabled VLMs.