Tighter Performance Theory of FedExProx
arXiv stat.ML / 4/21/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper re-examines FedExProx, a distributed optimization method that uses extrapolation to improve convergence in parallel proximal algorithms.
- It finds a surprising issue: the previously claimed guarantees for quadratic optimization are no better than standard Gradient Descent (GD).
- The authors introduce a new analysis framework that proves a tighter linear convergence rate for non-strongly convex quadratic problems, and shows FedExProx can outperform GD when computation and communication costs are included.
- The work further studies partial participation and proposes two adaptive extrapolation strategies (gradient diversity and Polyak stepsizes) that substantially improve over earlier results.
- Beyond quadratics, the analysis is extended to functions satisfying the Polyak–Lojasiewicz condition, with empirical evidence suggesting FedExProx has stronger potential for extrapolation benefits in federated learning.
Related Articles

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA