Byzantine-Robust and Communication-Efficient Distributed Training: Compressive and Cyclic Gradient Coding

arXiv cs.AI / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies distributed training under Byzantine attacks while accounting for communication constraints and the challenge of data heterogeneity across devices.
  • It introduces cyclic gradient coding-based distributed training (LAD), where the server pre-assigns the dataset and uses redundant, cyclic computational task allocation each iteration to improve robustness even when device data differs.
  • LAD encodes gradients at honest devices, then the server aggregates coded vectors using a robust aggregation rule that tolerates adversarial (Byzantine) messages.
  • The authors provide an analytical convergence characterization showing improved robustness and substantially lower solution error than prior robust aggregation approaches.
  • They further propose a communication-efficient extension (Com-LAD) that reduces communication overhead and validate both resilience and efficiency with numerical experiments.

Abstract

In this paper, we study the problem of distributed training (DT) under Byzantine attacks with communication constraints. While prior work has developed various robust aggregation rules at the server to enhance robustness to Byzantine attacks, the existing methods suffer from a critical limitation in that the solution error does not diminish when the local gradients sent by different devices vary considerably, as a result of data heterogeneity among the subsets held by different devices. To overcome this limitation, we propose a novel DT method, cyclic gradient coding-based DT (LAD). In LAD, the server allocates the entire training dataset to the devices before training begins. In each iteration, it assigns computational tasks redundantly to the devices using cyclic gradient coding. Each honest device then computes local gradients on a fixed number of data subsets and encodes the local gradients before transmitting to the server. The server aggregates the coded vectors from the honest devices and the potentially incorrect messages from Byzantine devices using a robust aggregation rule. Leveraging the redundancy of computation across devices, the convergence performance of LAD is analytically characterized, demonstrating improved robustness against Byzantine attacks and significantly lower solution error. Furthermore, we extend LAD to a communication-efficient variant, compressive and cyclic gradient coding-based DT (Com-LAD), which further reduces communication overhead under constrained settings. Numerical results validate the effectiveness of the proposed methods in enhancing both Byzantine resilience and communication efficiency.

Byzantine-Robust and Communication-Efficient Distributed Training: Compressive and Cyclic Gradient Coding | AI Navigate