Confidence Should Be Calibrated More Than One Turn Deep
arXiv cs.CL / 4/8/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that LLM confidence calibration must be treated as a dynamic, conversation-history-dependent problem rather than a static single-turn property for high-stakes multi-turn use cases.
- It introduces a multi-turn calibration task and a new metric, ECE@T, to measure how calibration changes across turns, showing that user feedback can worsen multi-turn calibration.
- To improve calibration, the authors propose MTCal, which minimizes ECE@T using a surrogate calibration target conditioned on prior dialogue.
- They also present ConfChat, a decoding strategy that uses calibrated confidence to improve response factuality and consistency in multi-turn interactions.
- Experiments report that MTCal yields strong, consistent performance for multi-turn calibration and that ConfChat maintains or improves overall multi-turn model quality.
Related Articles

Black Hat Asia
AI Business
[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project
Reddit r/MachineLearning
ALTK‑Evolve: On‑the‑Job Learning for AI Agents
Hugging Face Blog
Context Windows Are Getting Absurd — And That's a Good Thing
Dev.to
Every AI Agent Registry in 2026, Compared
Dev.to