Stability and Generalization for Decentralized Markov SGD

arXiv cs.LG / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies the stability and generalization of decentralized SGD and SGDA when training data are generated by Markov-chain-dependent sampling rather than independent samples.
  • It uses a stability-based analytical framework to explain how Markovian dependence and decentralized communication jointly affect generalization.
  • The authors derive non-asymptotic generalization bounds that incorporate network topology, Markov chain mixing properties, and the primal-dual dynamics in the optimization process.
  • Results extend existing theory for Markov stochastic gradient methods to both decentralized learning and minimax (saddle-point) settings.
  • The work specifically addresses analytical challenges arising from correlated data streams and decentralized optimization, providing tools to predict generalization behavior in such systems.

Abstract

Stochastic gradient methods are central to large-scale learning, yet their generalization theory typically relies on independent sampling assumptions. In many practical applications, data are generated by Markov chains and learning is performed in a decentralized manner, which introduces significant analytical challenges. In this work, we investigate the stability and generalization of decentralized stochastic gradient descent (SGD) and stochastic gradient descent ascent (SGDA) under Markov chain sampling. Leveraging a stability-based framework, we characterize how Markovian dependence and decentralized communication jointly influence generalization behavior. Our analysis captures the effects of network topology, Markov chain mixing properties, and primal-dual dynamics. We establish non-asymptotic generalization bounds for both algorithms, extending existing results on Markov stochastic gradient methods to decentralized and minimax settings.