FedSQ: Optimized Weight Averaging via Fixed Gating

arXiv cs.LG / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

FedSQ is a federated learning approach designed to address instability in naive weight averaging caused by client data heterogeneity (non-i.i.d. splits) during federated fine-tuning.
The method leverages the observation that ReLU-like (piecewise-linear) gating regimes can stabilize training by treating model structure (“structural knowledge”) separately from remaining parameters (“quantitative knowledge”).
FedSQ uses a DualCopy setup where a frozen copy of a pretrained backbone induces fixed binary gating masks, while only a quantitative copy is trained locally and aggregated across federated rounds.
By fixing the gating masks, FedSQ restricts learning to within-regime affine refinements, improving the stability of aggregation under heterogeneous client partitions.
Experiments on two CNN backbones across i.i.d. and Dirichlet data splits show improved robustness and potentially fewer rounds to reach best validation performance while maintaining accuracy in transfer-initialized settings.

Abstract

Federated learning (FL) enables collaborative training across organizations without sharing raw data, but it is hindered by statistical heterogeneity (non-i.i.d.\ client data) and by instability of naive weight averaging under client drift. In many cross-silo deployments, FL is warm-started from a strong pretrained backbone (e.g., ImageNet-1K) and then adapted to local domains. Motivated by recent evidence that ReLU-like gating regimes (structural knowledge) stabilize earlier than the remaining parameter values (quantitative knowledge), we propose FedSQ (Federated Structural-Quantitative learning), a transfer-initialized neural federated procedure based on a DualCopy, piecewise-linear view of deep networks. FedSQ freezes a structural copy of the pretrained model to induce fixed binary gating masks during federated fine-tuning, while only a quantitative copy is optimized locally and aggregated across rounds. Fixing the gating reduces learning to within-regime affine refinements, which stabilizes aggregation under heterogeneous partitions. Experiments on two convolutional neural network backbones under i.i.d.\ and Dirichlet splits show that FedSQ improves robustness and can reduce rounds-to-best validation performance relative to standard baselines while preserving accuracy in the transfer setting.