Optimization Trade-offs in Asynchronous Federated Learning: A Stochastic Networks Approach

arXiv cs.LG / 3/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper explains why asynchronous federated learning can outperform synchronous FL by increasing update throughput, but also introduces gradient staleness and bias toward faster clients under heterogeneous data.
It presents a stochastic queueing-network framework for Generalized AsyncSGD that jointly models client/server computation times and uplink/downlink delays, enabling closed-form characterization of update throughput.
The authors derive closed-form upper bounds for communication round complexity and expected wall-clock time to reach an ε-stationary point, explicitly formalizing the trade-off between gradient staleness and convergence speed.
The framework is extended to analyze energy consumption, revealing an additional convergence-speed versus energy-efficiency trade-off.
Experiments on EMNIST show substantial improvements over AsyncSGD, including 29%–46% faster convergence time and 36%–49% lower energy consumption, and the paper also proposes gradient-based strategies to optimize routing and concurrency.

Abstract

Synchronous federated learning scales poorly due to the straggler effect. Asynchronous algorithms increase the update throughput by processing updates upon arrival, but they introduce two fundamental challenges: gradient staleness, which degrades convergence, and bias toward faster clients under heterogeneous data distributions. Although algorithms such as AsyncSGD and Generalized AsyncSGD mitigate this bias via client-side task queues, most existing analyses neglect the underlying queueing dynamics and lack closed-form characterizations of the update throughput and gradient staleness. To close this gap, we develop a stochastic queueing-network framework for Generalized AsyncSGD that jointly models random computation times at the clients and the central server, as well as random uplink and downlink communication delays. Leveraging product-form network theory, we derive a closed-form expression for the update throughput, alongside closed-form upper bounds for both the communication round complexity and the expected wall-clock time required to reach an

\epsilon

-stationary point. These results formally characterize the trade-off between gradient staleness and wall-clock convergence speed. We further extend the framework to quantify energy consumption under stochastic timing, revealing an additional trade-off between convergence speed and energy efficiency. Building on these analytical results, we propose gradient-based optimization strategies to jointly optimize routing and concurrency. Experiments on EMNIST demonstrate reductions of 29%--46% in convergence time and 36%--49% in energy consumption compared to AsyncSGD.