How do LLMs Compute Verbal Confidence

arXiv cs.CL / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper investigates how LLMs generate “verbal confidence” tokens, addressing both *when* confidence is computed (cached during generation vs computed just-in-time) and *what* it represents internally (simple token log-probabilities vs richer representations of answer quality).
Using experiments on Gemma 3 27B and Qwen 2.5 7B—including activation steering, patching, noising, and swap tests—the authors find evidence that confidence is computed during answer generation and cached for later retrieval.
The study shows that confidence-related information emerges in answer-adjacent hidden states first, is stored at an early post-answer position, and is then retrieved when the model produces the verbalized confidence.
Attention blocking experiments pinpoint the information flow, indicating that confidence is gathered from answer tokens rather than being independently reconstructed at the verbalization site.
Linear probing and variance partitioning reveal that cached representations explain substantial variance in verbal confidence beyond token log-probabilities, supporting the view that verbal confidence is a sophisticated self-evaluation mechanism with implications for calibration and LLM metacognition.

Abstract

Verbal confidence -- prompting LLMs to state their confidence as a number or category -- is widely used to extract uncertainty estimates from black-box models. However, how LLMs internally generate such scores remains unknown. We address two questions: first, when confidence is computed - just-in-time when requested, or automatically during answer generation and cached for later retrieval; and second, what verbal confidence represents - token log-probabilities, or a richer evaluation of answer quality? Focusing on Gemma 3 27B and Qwen 2.5 7B, we provide convergent evidence for cached retrieval. Activation steering, patching, noising, and swap experiments reveal that confidence representations emerge at answer-adjacent positions before appearing at the verbalization site. Attention blocking pinpoints the information flow: confidence is gathered from answer tokens, cached at the first post-answer position, then retrieved for output. Critically, linear probing and variance partitioning reveal that these cached representations explain substantial variance in verbal confidence beyond token log-probabilities, suggesting a richer answer-quality evaluation rather than a simple fluency readout. These findings demonstrate that verbal confidence reflects automatic, sophisticated self-evaluation -- not post-hoc reconstruction -- with implications for understanding metacognition in LLMs and improving calibration.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/1DailyView insight →

Knowledge Governance For The Agentic Economy.

Dev.to

AI server farms heat up the neighborhood for miles around, paper finds

The Register

Does the Claude “leak” actually change anything in practice?

Reddit r/LocalLLaMA

87.4% of My Agent's Decisions Run on a 0.8B Model

Dev.to

AIエージェントをソフトウェアチームに変える無料ツール「Paperclip」

Dev.to

How do LLMs Compute Verbal Confidence

Key Points

Abstract

💡 Insights using this article

Related Articles

Knowledge Governance For The Agentic Economy.

AI server farms heat up the neighborhood for miles around, paper finds

Does the Claude “leak” actually change anything in practice?

87.4% of My Agent's Decisions Run on a 0.8B Model

AIエージェントをソフトウェアチームに変える無料ツール「Paperclip」

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer