Stochastic approximation in non-markovian environments revisited

arXiv stat.ML / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper revisits stochastic approximation when the driving process is non-Markovian and additionally non-ergodic, expanding the theoretical setting beyond standard assumptions.
  • It develops an analytic framework aimed at explaining transformer-based learning, with a focus on how the attention mechanism relates to learning dynamics.
  • The framework is also positioned to inform continual learning, emphasizing that such methods may depend on the full history of data in principle.
  • The work is presented as an arXiv preprint (v1) and builds on the author’s prior research on non-Markovian stochastic approximation.

Abstract

Based on some recent work of the author on stochastic approximation in non-markovian environments, the situation when the driving random process is non-ergodic in addition to being non-markovian is considered. Using this, we propose an analytic framework for understanding transformer based learning, specifically, the `attention' mechanism, and continual learning, both of which depend on the entire past in principle.