Gaussian Approximation for Asynchronous Q-learning

arXiv stat.ML / 4/9/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper derives convergence rates for Polyak-Ruppert averaged iterates produced by asynchronous Q-learning using polynomial step sizes with exponent ω in (1/2, 1].
It proves a high-dimensional central limit theorem for sums of martingale differences that arise in the asynchronous Q-learning setting.
Under a uniformly geometrically ergodic Markov-chain assumption for state-action-next-state transitions, the authors obtain a rate up to about n^{-1/6} with additional logarithmic factors over hyper-rectangles.
The work also provides bounds on high-order moments for the algorithm’s last iterate, offering further finite-sample characterization.

Abstract

In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak-Ruppert averaged iterates generated by the asynchronous Q-learning algorithm with a polynomial stepsize

k^{-\omega},\, \omega \in (1/2, 1]

. Assuming that the sequence of state-action-next-state triples

(s_k, a_k, s_{k+1})_{k \geq 0}

forms a uniformly geometrically ergodic Markov chain, we establish a rate of order up to

n^{-1/6} \log^{4} (nS A)

over the class of hyper-rectangles, where

n

is the number of samples used by the algorithm and

S

and

A

denote the numbers of states and actions, respectively. To obtain this result, we prove a high-dimensional central limit theorem for sums of martingale differences, which may be of independent interest. Finally, we present bounds for high-order moments for the algorithm's last iterate.

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)

Dev.to

Moving from proof of concept to production: what we learned with Nometria

Dev.to

Frontend Engineers Are Becoming AI Trainers

Dev.to

Gaussian Approximation for Asynchronous Q-learning

Key Points

Abstract

Related Articles

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

Moving from proof of concept to production: what we learned with Nometria

Frontend Engineers Are Becoming AI Trainers

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer