PoiCGAN: A Targeted Poisoning Based on Feature-Label Joint Perturbation in Federated Learning

arXiv cs.LG / 2026/3/26

📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisModels & Research

要点

  • The paper introduces PoiCGAN, a targeted poisoning attack for federated learning that uses feature–label joint perturbations to compromise industrial image classification models without triggering common anomaly defenses.
  • PoiCGAN is built on a conditional GAN whose generator/discriminator input modifications guide training to produce poisoned samples while also automatically performing label flipping.
  • Experiments on multiple datasets show an attack success rate improvement of 83.97% over baseline poisoning methods while keeping main-task accuracy degradation under 8.87%.
  • The authors report that both the crafted poisoned samples and resulting malicious models are highly stealthy, making them harder to detect and remove during model performance tests or anomaly-based defenses.

Abstract

Federated Learning (FL), as a popular distributed learning paradigm, has shown outstanding performance in improving computational efficiency and protecting data privacy, and is widely applied in industrial image classification. However, due to its distributed nature, FL is vulnerable to threats from malicious clients, with poisoning attacks being a common threat. A major limitation of existing poisoning attack methods is their difficulty in bypassing model performance tests and defense mechanisms based on model anomaly detection. This often results in the detection and removal of poisoned models, which undermines their practical utility. To ensure both the performance of industrial image classification and attacks, we propose a targeted poisoning attack, PoiCGAN, based on feature-label collaborative perturbation. Our method modifies the inputs of the discriminator and generator in the Conditional Generative Adversarial Network (CGAN) to influence the training process, generating an ideal poison generator. This generator not only produces specific poisoned samples but also automatically performs label flipping. Experiments across various datasets show that our method achieves an attack success rate 83.97% higher than baseline methods, with a less than 8.87% reduction in the main task's accuracy. Moreover, the poisoned samples and malicious models exhibit high stealthiness.