Decentralized Learning via Random Walk with Jumps

arXiv cs.LG / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies decentralized learning without a central coordinator, using token-based random-walk propagation of a single model with local updates at visited nodes to keep communication and computation overhead low.
  • It analyzes weighted random-walk learning that designs a transition matrix to sample from a target distribution, improving convergence under data heterogeneity, but shows that Metropolis-Hastings weighting can cause an “entrapment” effect where the walk gets stuck in a small network region.
  • The authors propose “Metropolis-Hastings with Lévy jumps” to periodically introduce long-range transitions, restoring exploration while still respecting local information constraints.
  • A new convergence-rate analysis is provided, explicitly linking performance to data heterogeneity, the network spectral gap, and the jump probability.
  • Experiments indicate that MHLJ removes entrapment and substantially speeds up decentralized learning compared with the weighted Metropolis-Hastings approach.

Abstract

We study decentralized learning over networks where data are distributed across nodes without a central coordinator. Random walk learning is a token-based approach in which a single model is propagated across the network and updated at each visited node using local data, thereby incurring low communication and computational overheads. In weighted random-walk learning, the transition matrix is designed to achieve a desired sampling distribution, thereby speeding up convergence under data heterogeneity. We show that implementing weighted sampling via the Metropolis-Hastings algorithm can lead to a previously unexplored phenomenon we term entrapment. The random walk may become trapped in a small region of the network, resulting in highly correlated updates and severely degraded convergence. To address this issue, we propose Metropolis-Hastings with Levy jumps, which introduces occasional long-range transitions to restore exploration while respecting local information constraints. We establish a convergence rate that explicitly characterizes the roles of data heterogeneity, network spectral gap, and jump probability, and demonstrate through experiments that MHLJ effectively eliminates entrapment and significantly speeds up decentralized learning.