Separation Assurance between Heterogeneous Fleets of Small Unmanned Aerial Systems via Multi-Agent Reinforcement Learning

arXiv cs.RO / 5/5/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies whether multi-agent reinforcement learning can achieve tactical deconfliction equilibrium for heterogeneous fleets of small unmanned aerial systems operating in dense urban airspace.
  • It asks two key questions: whether conflict-free separation policies converge to an equilibrium, and whether those converged policies unfairly discriminate against fleets with weaker configurations.
  • An attention-enhanced PPOA2C (Proximal Policy Optimization-based Advantage Actor-Critic) framework is used, with each fleet independently training its own policy while preserving privacy.
  • Experiments using package-delivery scenarios over Dallas, Texas show that two fleets with shared PPOA2C policies can reach equilibrium for safe separation and outperform strong rule-based baselines in conflict resolution.
  • Policy-configuration evaluation indicates that equilibria between similar policy types tend to favor stronger configurations, and even with similar configurations across different policy types, fairness-aware conflict management is needed.

Abstract

In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sensing, and communication ranges, making tactical deconfliction highly complex for the aircraft. This paper aims to address two core questions: (1) Can tactical deconfliction policies converge or reach an equilibrium to ensure a conflict-free airspace when companies operate heterogeneous fleets of homogeneous aircraft? (2) If so, will the converged policies discriminate against companies operating sUASs with weaker configurations? We investigate a multi-agent reinforcement learning paradigm in which homogeneous aircraft within heterogeneous fleets operate concurrently to perform package delivery missions over Dallas, Texas, USA. An attention-enhanced Proximal Policy Optimization-based Advantage Actor-Critic (PPOA2C) framework is employed to resolve intra- and inter-fleet conflicts, with each fleet independently training its own policy while preserving privacy. Experimental results show that two fleets with distinct, shared PPOA2C policies can reach an equilibrium to maintain safe separation. While two PPOA2C policies outperform two strong rule-based baselines in terms of conflict resolution, a PPOA2C policy exhibits safer interaction with a rule-based policy, indicating adaptive capabilities of PPOA2C policies. Furthermore, we conducted extensive policy-configuration evaluations, which reveal that equilibria between similar policy types tend to favor fleets with stronger configurations. Even under similar configurations but different policy types, the equilibrium favors one of the heterogeneous policies, underscoring the need for fairness-aware conflict management in heterogeneous sUAS operations.