Anchored Confabulation: Partial Evidence Non-Monotonically Amplifies Confident Hallucination in LLMs

arXiv cs.CL / 4/30/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper identifies a new calibration behavior in LLMs where giving a single confirmed intermediate fact can temporarily increase the model’s confident-wrong answers before later evidence corrects it, a phenomenon the authors call “anchored confabulation.”
  • They formalize this effect as Parametric Hallucination Confidence (PHC) and validate it across multiple evidence types, including a causal injection experiment and cross-family scaling results.
  • A proposed “Anchoring Threshold Law” predicts how PHC amplification grows with reasoning hop depth, showing measurable effects when multiple intermediate predictions are confirmed.
  • The authors demonstrate an application to RAG routing: a LearnedRouter that exploits PHC substantially reduces the oracle performance gap without model fine-tuning, using far fewer labels than earlier RL-based approaches.
  • Mitigation experiments suggest that an epistemic-humility prompt and explicit self-rating can reduce PHC spikes, with self-rating outperforming lexical confidence for routing signals.

Abstract

We identify a previously unknown calibration property of large language models: providing one confirmed intermediate fact toward a multi-step reasoning chain increases the model's confident-wrong-answer rate before full evidence eliminates it. We call this anchored confabulation: a partial anchor commits the model to confident parametric completion of remaining reasoning steps. We formalize it as Parametric Hallucination Confidence (PHC) and establish it across six lines of evidence including a causal injection experiment (PHC 0.613 to 0.656 to 0.595 to 0.536, N=160) and capability scaling across five model families (Spearman rho=0.900, p=0.037). The Anchoring Threshold Law k*(n)=floor(n/3) predicts PHC amplification by hop depth with four confirmed predictions. Applied to RAG routing, a LearnedRouter exploiting PHC closes 81.1% of the oracle performance gap (macro F1=0.426, p<1e-6) on 1,800 queries across four benchmarks with no model fine-tuning and 50x fewer labels than prior RL-based work. An epistemic humility prompt reduces the PHC spike by -0.118; explicit self-rating (PHC=0.684, p<0.001) outperforms lexical confidence as a routing signal.