Optimizing Stochastic Gradient Push under Broadcast Communications

arXiv cs.LG / 4/20/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses how to minimize convergence time for decentralized federated learning over wireless broadcast channels, emphasizing the role of the mixing matrix design.
Prior approaches for decentralized parallel SGD typically require symmetric and doubly stochastic mixing matrices, which restrict the communication graph to undirected (bidirected) structures and reduce design flexibility.
The authors instead focus on stochastic gradient push (SGP), showing that asymmetric mixing matrices are allowed and enable directed communication graphs.
By deriving how SGP’s convergence rate depends on the mixing matrices, they formulate an objective tied to graph-theoretic properties and propose an efficient algorithm with performance guarantees.
Experiments using real data indicate the method can significantly shorten convergence time versus state-of-the-art approaches without degrading trained model quality.

Abstract

We consider the problem of minimizing the convergence time for decentralized federated learning (DFL) in wireless networks under broadcast communications, with focus on mixing matrix design. The mixing matrix is a critical hyperparameter for DFL that simultaneously controls the convergence rate across iterations and the communication demand per iteration, both strongly influencing the convergence time. Although the problem has been studied previously, existing solutions are mostly designed for decentralized parallel stochastic gradient descent (D-PSGD), which requires the mixing matrix to be symmetric and doubly stochastic. These constraints confine the activated communication graph to undirected (i.e., bidirected) graphs, which limits design flexibility. In contrast, we consider mixing matrix design for stochastic gradient push (SGP), which allows asymmetric mixing matrices and hence directed communication graphs. By analyzing how the convergence rate of SGP depends on the mixing matrices, we extract an objective function that explicitly depends on graph-theoretic parameters of the activated communication graph, based on which we develop an efficient design algorithm with performance guarantees. Our evaluations based on real data show that the proposed solution can notably reduce the convergence time compared to the state of the art without compromising the quality of the trained model.

Awesome Open-Weight Models: The Practitioner's Guide to Open-Source LLMs (2026 Edition) [P]

Reddit r/MachineLearning

The Mythos vs GPT-5.4-Cyber debate is missing the benchmark

Dev.to

Beyond the Crop: Automating "Ghost Mannequin" Effects with Depth-Aware Inpainting

Dev.to

The $20/month AI subscription is gaslighting developers in emerging markets

Dev.to

Everyone is using Claude Code wrong.

Dev.to

Optimizing Stochastic Gradient Push under Broadcast Communications

Key Points

Abstract

Related Articles

Awesome Open-Weight Models: The Practitioner's Guide to Open-Source LLMs (2026 Edition) [P]

The Mythos vs GPT-5.4-Cyber debate is missing the benchmark

Beyond the Crop: Automating "Ghost Mannequin" Effects with Depth-Aware Inpainting

The $20/month AI subscription is gaslighting developers in emerging markets

Everyone is using Claude Code wrong.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer