Finite-Time Decoupled Convergence in Nonlinear Two-Time-Scale Stochastic Approximation

arXiv stat.ML / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies two-time-scale stochastic approximation (SA) and asks whether nonlinear settings can still achieve “decoupled convergence,” where convergence rates depend only on each iterate’s own step size.
  • It proves that finite-time decoupled convergence rates are attainable in nonlinear two-time-scale SA under a nested local linearity assumption, provided that step sizes are selected appropriately.
  • The analysis controls the influence between the iterates by bounding the matrix cross term and using fourth-order moment convergence rates to manage higher-order errors from local linearity.
  • The authors show via a constructed example that local linearity is (at least in general) necessary: nonlinearity in the slow-time-scale update alone can eliminate decoupled convergence even if the fast-time-scale update is linear.

Abstract

In two-time-scale stochastic approximation (SA), two iterates are updated at varying speeds using different step sizes, with each update influencing the other. Previous studies on linear two-time-scale SA have shown that the convergence rates of the mean-square errors for these updates depend solely on their respective step sizes, a phenomenon termed decoupled convergence. However, achieving decoupled convergence in nonlinear SA remains less understood. Our research investigates the potential for finite-time decoupled convergence in nonlinear two-time-scale SA. We demonstrate that, under a nested local linearity assumption, finite-time decoupled convergence rates can be achieved with suitable step size selection. To derive this result, we conduct a convergence analysis of the matrix cross term between the iterates and leverage fourth-order moment convergence rates to control the higher-order error terms induced by local linearity. To further investigate the necessity of local linearity for decoupled convergence, we also construct an example showing that, even when the fast-time-scale update is linear, the nonlinearity of the slow-time-scale update alone can destroy decoupled convergence.