A Hierarchical Sampling Framework for bounding the Generalization Error of Federated Learning

arXiv cs.LG / 5/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a hierarchical sampling framework for Hierarchical Federated Learning (HFL) and analyzes expected generalization error using Wasserstein distance.
  • It models hierarchical data sampling with a multi-layer tree to capture dependencies among clients’ datasets, then derives Wasserstein-based generalization bounds under a Lipschitz loss assumption.
  • A supersample construction is used to quantify how sensitive the learning algorithm is to changes at a single node in the sampling tree.
  • The results both generalize and strictly imply existing conditional mutual information (CMI) bounds for bounded losses, leveraging the federated learning structure.
  • The framework can be combined with Differential Privacy assumptions to yield generalization bounds tied to algorithmic privacy, and the paper validates tightness using the Gaussian Location Model (GLM).

Abstract

We study expected generalization bounds for the Hierarchical Federated Learning (HFL) setup using Wasserstein distance. We introduce a generalized framework in which data is sampled hierarchically, and we model it with a multi-layered tree structure that induces dependencies among the clients' datasets. We derive generalization bounds in terms of Wasserstein distance under the Lipschitz assumption on the loss function, by applying a supersample construction that allows us to measure the sensitivity of the algorithm to the change of a single node in the sampling tree. By leveraging the FL structure, we recover and strictly imply existing state-of-the-art conditional mutual information (CMI) bounds in the case of bounded losses. We also show that our bound can be applied together with Differential Privacy assumptions, to recover generalization bounds based on algorithmic privacy. To assess the tightness of our bounds, we study the Gaussian Location Model (GLM) and show that we recover the actual asymptotic rate of the generalization error.