Toward Efficient Membership Inference Attacks against Federated Large Language Models: A Projection Residual Approach

arXiv cs.LG / 4/24/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Federated Large Language Models (FedLLMs) can collaboratively fine-tune without sharing raw data, but shared gradients can still leak membership information via membership inference attacks (MIAs).
  • The paper argues that prior MIA techniques fail against FedLLMs due to their massive parameter scales, rapid convergence, and sparse, non-orthogonal gradient structures.
  • It introduces ProjRes, a passive, projection-residuals-based MIA that uses hidden embedding vectors and projection residuals in the gradient subspace to infer whether specific samples were used.
  • ProjRes does not rely on shadow models, auxiliary classifiers, or historical updates, and experiments report near-100% accuracy with large improvements over prior work (up to 75.75%) even under strong differential privacy.
  • The authors conclude that FedLLMs have an overlooked privacy vulnerability and recommend re-evaluating existing security assumptions, with code and data released publicly.

Abstract

Federated Large Language Models (FedLLMs) enable multiple parties to collaboratively fine-tune LLMs without sharing raw data, addressing challenges of limited resources and privacy concerns. Despite data localization, shared gradients can still expose sensitive information through membership inference attacks (MIAs). However, FedLLMs' unique properties, i.e. massive parameter scales, rapid convergence, and sparse, non-orthogonal gradients, render existing MIAs ineffective. To address this gap, we propose ProjRes, the first projection residuals-based passive MIA tailored for FedLLMs. ProjRes leverages hidden embedding vectors as sample representations and analyzes their projection residuals on the gradient subspace to uncover the intrinsic link between gradients and inputs. It requires no shadow models, auxiliary classifiers, or historical updates, ensuring efficiency and robustness. Experiments on four benchmarks and four LLMs show that ProjRes achieves near 100% accuracy, outperforming prior methods by up to 75.75%, and remains effective even under strong differential privacy defenses. Our findings reveal a previously overlooked privacy vulnerability in FedLLMs and call for a re-examination of their security assumptions. Our code and data are available at \href{https://anonymous.4open.science/r/Passive-MIA-5268}{link}.