Optimizing Stochastic Gradient Push under Broadcast Communications

arXiv cs.LG / 4/20/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses how to minimize convergence time for decentralized federated learning over wireless broadcast channels, emphasizing the role of the mixing matrix design.
  • Prior approaches for decentralized parallel SGD typically require symmetric and doubly stochastic mixing matrices, which restrict the communication graph to undirected (bidirected) structures and reduce design flexibility.
  • The authors instead focus on stochastic gradient push (SGP), showing that asymmetric mixing matrices are allowed and enable directed communication graphs.
  • By deriving how SGP’s convergence rate depends on the mixing matrices, they formulate an objective tied to graph-theoretic properties and propose an efficient algorithm with performance guarantees.
  • Experiments using real data indicate the method can significantly shorten convergence time versus state-of-the-art approaches without degrading trained model quality.

Abstract

We consider the problem of minimizing the convergence time for decentralized federated learning (DFL) in wireless networks under broadcast communications, with focus on mixing matrix design. The mixing matrix is a critical hyperparameter for DFL that simultaneously controls the convergence rate across iterations and the communication demand per iteration, both strongly influencing the convergence time. Although the problem has been studied previously, existing solutions are mostly designed for decentralized parallel stochastic gradient descent (D-PSGD), which requires the mixing matrix to be symmetric and doubly stochastic. These constraints confine the activated communication graph to undirected (i.e., bidirected) graphs, which limits design flexibility. In contrast, we consider mixing matrix design for stochastic gradient push (SGP), which allows asymmetric mixing matrices and hence directed communication graphs. By analyzing how the convergence rate of SGP depends on the mixing matrices, we extract an objective function that explicitly depends on graph-theoretic parameters of the activated communication graph, based on which we develop an efficient design algorithm with performance guarantees. Our evaluations based on real data show that the proposed solution can notably reduce the convergence time compared to the state of the art without compromising the quality of the trained model.