Shard the Gradient, Scale the Model: Serverless Federated Aggregation via Gradient Partitioning
arXiv cs.AI / 4/27/2026
💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies a key scalability bottleneck for federated learning on serverless platforms: existing designs require each aggregator to hold the full model gradient in memory, which fails once gradients exceed per-function memory limits like AWS Lambda’s ~10 GB.
- It proposes GradsSharding, which partitions the gradient tensor into M shards and averages each shard independently in separate serverless functions that still receive contributions from all clients.
- The authors claim FedAvg’s element-wise nature makes the sharded approach produce bit-identical aggregation results to tree-based methods, so model accuracy is invariant by construction.
- Experiments on HPC and real AWS Lambda deployments (43 MB to 5 GB gradients/models) show a cost crossover around 500 MB, about a 2.7x cost reduction at VGG-16 scale, and the ability to aggregate beyond serverless memory ceilings where prior architectures cannot.




