Central Limit Theorems for Asynchronous Averaged Q-Learning

arXiv stat.ML / 4/21/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proves central limit theorems for Polyak-Ruppert averaged Q-learning when updates occur asynchronously, extending prior results to more realistic training settings.
  • It provides a non-asymptotic central limit theorem with an explicit convergence rate in Wasserstein distance that depends on iteration count, the size of the state-action space, the discount factor, and exploration quality.
  • It also derives a functional central limit theorem showing that the cumulative partial-sum process converges weakly to a Brownian motion.
  • Overall, the work gives rigorous statistical guarantees and quantitative error scaling for stochastic approximation dynamics in asynchronous reinforcement learning.

Abstract

This paper establishes central limit theorems for Polyak-Ruppert averaged Q-learning under asynchronous updates. We prove a non-asymptotic central limit theorem, where the convergence rate in Wasserstein distance explicitly reflects the dependence on the number of iterations, state-action space size, the discount factor, and the quality of exploration. In addition, we derive a functional central limit theorem, showing that the partial-sum process converges weakly to a Brownian motion.