Abstract
Federated learning increasingly operates in a large-model regime where communication, memory, and computation are all scarce. Typically, non-IID client data induce drift that degrades the stability and performance of local training. Existing remedies such as SCAFFOLD introduce heterogeneity-correction mechanisms to address this challenge, but they incur substantial extra communication and memory overhead. This paper proposes a subspace optimization method for federated learning (SSF), which performs heterogeneity-corrected optimization in a low-dimensional subspace using only projected quantities, while preserving full-dimensional control information through a backfill-style update that retains residual components whenever the active subspace changes. Under standard smoothness and bounded-variance assumptions, SSF attains a non-asymptotic rate of order \widetilde{\mathcal{O}}(1/T+1/\sqrt{NKT}). Experiments show favorable accuracy--efficiency trade-offs under heterogeneous data.