The {\alpha}-Law of Observable Belief Revision in Large Language Model Inference

arXiv cs.AI / 3/23/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper identifies the alpha-law, a multiplicative scaling rule that governs how instruction-tuned LLMs revise probability assignments over candidate answers, parameterized by a belief revision exponent.
It proves that exponent values below one are necessary and sufficient for asymptotic stability under repeated revisions.
Empirical evaluation across 4,975 problems from GPQA Diamond, TheoremQA, MMLU-Pro, and ARC-Challenge, and across model families (GPT-5.2 and Claude Sonnet 4) shows near-Bayesian update behavior with single-step revisions slightly above the stability boundary.
In multi-step revisions, the exponent decreases over time, producing contractive long-run dynamics consistent with the theoretical stability predictions.
Token-level validation using Llama-3.3-70B and architecture-specific trust-ratio patterns (GPT-5.2 balancing prior and evidence vs. Claude prioritizing new evidence) demonstrate the phenomenon across log-probability and self-reported confidence, and the work positions the alpha-law as a principled diagnostic for monitoring update stability and reasoning quality in LLM inference systems.

Abstract

Large language models (LLMs) that iteratively revise their outputs through mechanisms such as chain-of-thought reasoning, self-reflection, or multi-agent debate lack principled guarantees regarding the stability of their probability updates. We identify a consistent multiplicative scaling law that governs how instruction-tuned LLMs revise probability assignments over candidate answers, expressed as a belief revision exponent that controls how prior beliefs and verification evidence are combined during updates. We show theoretically that values of the exponent below one are necessary and sufficient for asymptotic stability under repeated revision. Empirical evaluation across 4,975 problems spanning graduate-level benchmarks (GPQA Diamond, TheoremQA, MMLU-Pro, and ARC-Challenge) and multiple model families (GPT-5.2 and Claude Sonnet 4) reveals near-Bayesian update behavior, with models operating slightly above the stability boundary in single-step revisions. However, multi-step experiments demonstrate that the exponent decreases over successive revisions, producing contractive long-run dynamics consistent with theoretical stability predictions. Token-level validation using Llama-3.3-70B further confirms similar behavior across both log-probability measurements and self-reported confidence elicitation. Analysis of update components exposes architecture-specific trust-ratio patterns, with GPT-5.2 showing balanced weighting between prior and evidence, while Claude modestly favors new evidence. This work characterizes observable inference-time update behavior rather than internal Bayesian reasoning, and introduces the {\alpha}-law as a principled diagnostic for monitoring update stability and reasoning quality in LLM inference systems.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/23DailyView insight →

How political censorship actually works inside Qwen, DeepSeek, GLM, and Yi: Ablation and behavioral results across 9 models

Reddit r/LocalLLaMA

Engenharia de Prompt: Por Que a Forma Como Você Pergunta Muda Tudo(Um guia introdutório)

Dev.to

The Obligor

Dev.to

The Markup

Dev.to

2026 年 AI 部落格變現完整攻略：從第一篇文章到月收入 $1000

Dev.to

The {\alpha}-Law of Observable Belief Revision in Large Language Model Inference

Key Points

Abstract

💡 Insights using this article

Related Articles

How political censorship actually works inside Qwen, DeepSeek, GLM, and Yi: Ablation and behavioral results across 9 models

Engenharia de Prompt: Por Que a Forma Como Você Pergunta Muda Tudo(Um guia introdutório)

The Obligor

The Markup

2026 年 AI 部落格變現完整攻略：從第一篇文章到月收入 $1000

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer