BlazeFL: Fast and Deterministic Federated Learning Simulation

arXiv cs.LG / 4/7/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

BlazeFL is a lightweight single-node federated learning simulation framework designed to improve both efficiency and reproducibility in high-concurrency experiments with many virtual clients.
It achieves deterministic behavior by managing randomness via isolated per-client RNG streams and requiring stochastic operators to use BlazeFL-managed generators to enable bitwise-identical results across repeated runs.
BlazeFL uses thread-based parallelism with in-memory server-client parameter exchange to avoid serialization and inter-process communication overhead.
In CIFAR-10 experiments, BlazeFL reports up to 3.1× speedup versus a common open-source baseline while keeping a small dependency footprint.
The authors provide an open-source implementation on GitHub to support adoption by FL researchers needing deterministic, fast simulation setups.

Abstract

Federated learning (FL) research increasingly relies on single-node simulations with hundreds or thousands of virtual clients, making both efficiency and reproducibility essential. Yet parallel client training often introduces nondeterminism through shared random state and scheduling variability, forcing researchers to trade throughput for reproducibility or to implement custom control logic within complex frameworks. We present BlazeFL, a lightweight framework for single-node FL simulation that alleviates this trade-off through free-threaded shared-memory execution and deterministic randomness management. BlazeFL uses thread-based parallelism with in-memory parameter exchange between the server and clients, avoiding serialization and inter-process communication overhead. To support deterministic execution, BlazeFL assigns isolated random number generator (RNG) streams to clients. Under a fixed software/hardware stack, and when stochastic operators consume BlazeFL-managed generators, this design yields bitwise-identical results across repeated high-concurrency runs in both thread-based and process-based modes. In CIFAR-10 image-classification experiments, BlazeFL substantially reduces execution time relative to a widely used open-source baseline, achieving up to 3.1

\times

speedup on communication-dominated workloads while preserving a lightweight dependency footprint. Our open-source implementation is available at: https://github.com/kitsuyaazuma/blazefl.