FedSQ: Optimized Weight Averaging via Fixed Gating

arXiv cs.LG / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • FedSQ is a federated learning approach designed to address instability in naive weight averaging caused by client data heterogeneity (non-i.i.d. splits) during federated fine-tuning.
  • The method leverages the observation that ReLU-like (piecewise-linear) gating regimes can stabilize training by treating model structure (“structural knowledge”) separately from remaining parameters (“quantitative knowledge”).
  • FedSQ uses a DualCopy setup where a frozen copy of a pretrained backbone induces fixed binary gating masks, while only a quantitative copy is trained locally and aggregated across federated rounds.
  • By fixing the gating masks, FedSQ restricts learning to within-regime affine refinements, improving the stability of aggregation under heterogeneous client partitions.
  • Experiments on two CNN backbones across i.i.d. and Dirichlet data splits show improved robustness and potentially fewer rounds to reach best validation performance while maintaining accuracy in transfer-initialized settings.

Abstract

Federated learning (FL) enables collaborative training across organizations without sharing raw data, but it is hindered by statistical heterogeneity (non-i.i.d.\ client data) and by instability of naive weight averaging under client drift. In many cross-silo deployments, FL is warm-started from a strong pretrained backbone (e.g., ImageNet-1K) and then adapted to local domains. Motivated by recent evidence that ReLU-like gating regimes (structural knowledge) stabilize earlier than the remaining parameter values (quantitative knowledge), we propose FedSQ (Federated Structural-Quantitative learning), a transfer-initialized neural federated procedure based on a DualCopy, piecewise-linear view of deep networks. FedSQ freezes a structural copy of the pretrained model to induce fixed binary gating masks during federated fine-tuning, while only a quantitative copy is optimized locally and aggregated across rounds. Fixing the gating reduces learning to within-regime affine refinements, which stabilizes aggregation under heterogeneous partitions. Experiments on two convolutional neural network backbones under i.i.d.\ and Dirichlet splits show that FedSQ improves robustness and can reduce rounds-to-best validation performance relative to standard baselines while preserving accuracy in the transfer setting.