FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion

arXiv cs.LG / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • FedProxy addresses the “federated fine-tuning trilemma” by targeting three issues at once: LLM IP protection, client privacy, and performance degradation on heterogeneous data.
  • The work finds that prior IP-preserving approaches such as Offsite-Tuning (OT) rely on weak adapters and therefore hit a performance bottleneck that trails centralized training.
  • FedProxy improves fidelity by replacing lightweight adapters with a single, unified Proxy Small Language Model (SLM) compressed from the proprietary LLM to act as a surrogate for collaborative fine-tuning.
  • The proposed three-stage design combines server-guided compression, an interference-mitigating aggregation method for heterogeneity, and a training-free “plug-in” fusion step to merge the learned improvements back into the full LLM.
  • Experiments indicate FedProxy substantially outperforms OT and can approach centralized fine-tuning performance, while also setting a new benchmark for secure, high-performance federated LLM adaptation.

Abstract

Federated fine-tuning of Large Language Models (LLMs) is obstructed by a trilemma of challenges: protecting LLMs intellectual property (IP), ensuring client privacy, and mitigating performance loss on heterogeneous data. Existing methods like Offsite-Tuning (OT) secure the LLMs IP by having clients train only lightweight adapters, yet our analysis reveals they suffer from a fundamental performance bottleneck, leaving a significant gap compared to centralized training. To bridge this gap, we introduce FedProxy, a new federated adaptation framework. FedProxy replaces weak adapters with a unified, powerful Proxy Small Language Model (SLM), compressed from the proprietary LLM, to serve as a high-fidelity surrogate for collaborative fine-tuning. Our framework systematically resolves the trilemma through a three-stage architecture: (i) Efficient Representation via server-guided compression to create a resource-friendly proxy; (ii) Robust Optimization through an interference-mitigating aggregation strategy to handle data heterogeneity; and (iii) Effortless Fusion via a training-free "plug-in" mechanism to integrate learned knowledge back into the LLM. Experiments show FedProxy significantly outperforms OT methods and approaches centralized performance, establishing a new benchmark for secure and high-performance federated LLM adaptation.