AI Navigate

Client-Conditional Federated Learning via Local Training Data Statistics

arXiv cs.LG / 3/13/2026

💬 OpinionModels & Research

Key Points

  • This paper proposes conditioning a single global federated learning model on locally computed PCA statistics of each client's training data, enabling zero additional communication while addressing data heterogeneity.
  • It evaluates across 97 configurations spanning four heterogeneity types, four datasets, and seven baseline methods, finding that the approach matches the Oracle baseline and improves by 1–6% in combined heterogeneity, while being sparsity-robust.
  • The results show that continuous PCA-based statistics can outperform discrete cluster identifiers in guiding client-specific conditioning, especially under rich heterogeneity.
  • By removing the need for cluster discovery or per-client models, the method simplifies and strengthens the practical deployment of federated learning in sparse, heterogeneous environments.

Abstract

Federated learning (FL) under data heterogeneity remains challenging: existing methods either ignore client differences (FedAvg), require costly cluster discovery (IFCA), or maintain per-client models (Ditto). All degrade when data is sparse or heterogeneity is multi-dimensional. We propose conditioning a single global model on locally-computed PCA statistics of each client's training data, requiring zero additional communication. Evaluating across 97~configurations spanning four heterogeneity types (label shift, covariate shift, concept shift, and combined heterogeneity), four datasets (MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100), and seven FL baseline methods, we find that our method matches the Oracle baseline -- which knows true cluster assignments -- across all settings, surpasses it by 1--6% on combined heterogeneity where continuous statistics are richer than discrete cluster identifiers, and is uniquely sparsity-robust among all tested methods.