Zero-Shot Scalable Resilience in UAV Swarms: A Decentralized Imitation Learning Framework with Physics-Informed Graph Interactions

arXiv cs.LG / 4/20/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses how large UAV swarm failures can fragment a network into disconnected sub-networks, making decentralized recovery urgent yet difficult.
  • It proposes PhyGAIL, a decentralized-execution framework trained centrally that builds bounded local interaction graphs from heterogeneous observations to remain scale-invariant.
  • PhyGAIL uses physics-informed graph neural network components with gated message passing to encode directional attraction/repulsion, giving coordination behavior grounded in physical constraints.
  • The approach includes scenario-adaptive imitation learning to handle fragmented topologies and variable-length recovery episodes, and provides theoretical analysis on bounded graph amplification and controlled success-signal variance.
  • Experiments show a policy trained on 20-UAV swarms transfers directly to swarms up to 500 UAVs without fine-tuning, improving reconnection reliability, recovery speed, motion safety, and runtime efficiency over baselines.

Abstract

Large-scale Unmanned Aerial Vehicle (UAV) failures can split an unmanned aerial vehicle swarm network into disconnected sub-networks, making decentralized recovery both urgent and difficult. Centralized recovery methods depend on global topology information and become communication-heavy after severe fragmentation. Decentralized heuristics and multi-agent reinforcement learning methods are easier to deploy, but their performance often degrades when the swarm scale and damage severity vary. We present Physics-informed Graph Adversarial Imitation Learning algorithm (PhyGAIL) that adopts centralized training with decentralized execution. PhyGAIL builds bounded local interaction graphs from heterogeneous observations, and uses physics-informed graph neural network to encode directional local interactions as gated message passing with explicit attraction and repulsion. This gives the policy a physically grounded coordination bias while keeping local observations scale-invariant. It also uses scenario-adaptive imitation learning to improve training under fragmented topologies and variable-length recovery episodes. Our analysis establishes bounded local graph amplification, bounded interaction dynamics, and controlled variance of the terminal success signal. A policy trained on 20-UAV swarms transfers directly to swarms of up to 500 UAVs without fine-tuning, and achieves better performance across reconnection reliability, recovery speed, motion safety, and runtime efficiency than representative baselines.