FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment

arXiv cs.LG / 3/23/2026

📰 NewsModels & Research

Key Points

  • The paper addresses aligning large language models (LLMs) with human preferences in federated learning (FL), highlighting challenges from decentralized, privacy-sensitive, and non-IID data and noting limitations of applying direct preference optimization in FL.
  • FedPDPO proposes a parameter-efficient, federated personalization framework that uses a frozen pretrained LLM backbone with a LoRA adapter to enable communication-efficient aggregation.
  • The approach includes a globally shared LoRA adapter paired with client-specific LLM heads, a client-specific explicit reward head, and a bottleneck adapter to balance global and local feature representations.
  • The authors provide theoretical analysis and demonstrate state-of-the-art performance through extensive experiments, reporting up to 4.80% average accuracy improvements in both federated intra-domain and cross-domain settings.

Abstract

Aligning large language models (LLMs) with human preferences in federated learning (FL) is challenging due to decentralized, privacy-sensitive, and highly non-IID preference data. Direct Preference Optimization (DPO) offers an efficient alternative to reinforcement learning with human feedback (RLHF), but its direct application in FL suffers from severe performance degradation under non-IID data and limited generalization of implicit rewards. To bridge this gap, we propose FedPDPO (Federated Personalized Direct Preference Optimization), a personalized federated framework for preference alignment of LLMs. It adopts a parameter-efficient fine-tuning architecture where each client maintains a frozen pretrained LLM backbone augmented with a Low-Rank Adaptation (LoRA) adapter, enabling communication-efficient aggregation. To address non-IID heterogeneity, we devise (1) the globally shared LoRA adapter with the personalized client-specific LLM head. Moreover, we introduce (2) a personalized DPO training strategy with a client-specific explicit reward head to complement implicit rewards and further alleviate non-IID heterogeneity, and (3) a bottleneck adapter to balance global and local features. We provide theoretical analysis establishing the probabilistic foundation and soundness. Extensive experiments on multiple preference datasets demonstrate state-of-the-art performance, achieving up to 4.80% average accuracy improvements in federated intra-domain and cross-domain settings.