Anchored Confabulation: Partial Evidence Non-Monotonically Amplifies Confident Hallucination in LLMs

arXiv cs.CL / 4/30/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Read original →

共有:

Key Points

The paper identifies a new calibration behavior in LLMs where giving a single confirmed intermediate fact can temporarily increase the model’s confident-wrong answers before later evidence corrects it, a phenomenon the authors call “anchored confabulation.”
They formalize this effect as Parametric Hallucination Confidence (PHC) and validate it across multiple evidence types, including a causal injection experiment and cross-family scaling results.
A proposed “Anchoring Threshold Law” predicts how PHC amplification grows with reasoning hop depth, showing measurable effects when multiple intermediate predictions are confirmed.
The authors demonstrate an application to RAG routing: a LearnedRouter that exploits PHC substantially reduces the oracle performance gap without model fine-tuning, using far fewer labels than earlier RL-based approaches.
Mitigation experiments suggest that an epistemic-humility prompt and explicit self-rating can reduce PHC spikes, with self-rating outperforming lexical confidence for routing signals.

Abstract

We identify a previously unknown calibration property of large language models: providing one confirmed intermediate fact toward a multi-step reasoning chain increases the model's confident-wrong-answer rate before full evidence eliminates it. We call this anchored confabulation: a partial anchor commits the model to confident parametric completion of remaining reasoning steps. We formalize it as Parametric Hallucination Confidence (PHC) and establish it across six lines of evidence including a causal injection experiment (PHC 0.613 to 0.656 to 0.595 to 0.536, N=160) and capability scaling across five model families (Spearman rho=0.900, p=0.037). The Anchoring Threshold Law k*(n)=floor(n/3) predicts PHC amplification by hop depth with four confirmed predictions. Applied to RAG routing, a LearnedRouter exploiting PHC closes 81.1% of the oracle performance gap (macro F1=0.426, p<1e-6) on 1,800 queries across four benchmarks with no model fine-tuning and 50x fewer labels than prior RL-based work. An epistemic humility prompt reduces the PHC spike by -0.118; explicit self-rating (PHC=0.684, p<0.001) outperforms lexical confidence as a routing signal.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/30DailyView insight →

Black Hat USA

AI Business

Chinese firms face pressure on AI investments as US peers’ spending keeps soaring

SCMP Tech

Building a Local AI Agent (Part 2): Six UX and UI Design Challenges

Dev.to

The Prompt Caching Mistake That's Costing You 70% More Than You Need to Pay

Dev.to

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works

Dev.to

Anchored Confabulation: Partial Evidence Non-Monotonically Amplifies Confident Hallucination in LLMs

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat USA

Chinese firms face pressure on AI investments as US peers’ spending keeps soaring

Building a Local AI Agent (Part 2): Six UX and UI Design Challenges

The Prompt Caching Mistake That's Costing You 70% More Than You Need to Pay

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer