Central Limit Theorems for Transition Probabilities of Controlled Markov Chains

arXiv stat.ML / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper develops a central limit theorem (CLT) for a non-parametric estimator of transition matrices in controlled Markov chains with finite state-action spaces.
It specifies precise conditions on the logging policy for when the estimator becomes asymptotically normal, and also identifies scenarios where a CLT cannot exist.
The authors extend the CLT results to derive asymptotic normality for value, Q-, and advantage functions of any stationary stochastic policy, including optimal policy recovery from the estimated transition model.
As a corollary, the work derives goodness-of-fit tests to check whether logged data is stochastic, enabling hypothesis tests for transition probabilities.
Overall, the paper provides new statistical tools for offline policy evaluation and offline optimal policy recovery with uncertainty quantification via asymptotic inference.

Abstract

We develop a central limit theorem (CLT) for a non-parametric estimator of the transition matrices in controlled Markov chains (CMCs) with finite state-action spaces. Our results establish precise conditions on the logging policy under which the estimator is asymptotically normal, and reveal settings in which no CLT can exist. We then build on it to derive CLTs for the value, Q-, and advantage functions of any stationary stochastic policy, including the optimal policy recovered from the estimated model. Goodness-of-fit tests are derived as a corollary, which enable to test whether the logged data is stochastic. These results provide new statistical tools for offline policy evaluation and optimal policy recovery, and enable hypothesis tests for transition probabilities.

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Dev.to

Mercor competitor Deccan AI raises $25M, sources experts from India

Dev.to

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

Dev.to

How Should Students Document AI Usage in Academic Work?

Dev.to

They Did Not Accidentally Make Work the Answer to Who You Are

Dev.to

Central Limit Theorems for Transition Probabilities of Controlled Markov Chains

Key Points

Abstract

Related Articles

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Mercor competitor Deccan AI raises $25M, sources experts from India

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

How Should Students Document AI Usage in Academic Work?

They Did Not Accidentally Make Work the Answer to Who You Are

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer