Fine-Tuning Large Language Models for Cooperative Tactical Deconfliction of Small Unmanned Aerial Systems
arXiv cs.RO / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses tactical deconfliction for low-altitude small UAVs as a safety-critical multi-agent problem requiring cooperative separation and operational efficiency under partial observability.
- It proposes using fine-tuned LLMs as decision-makers, overcoming limitations of direct LLM use by generating domain-grounded, rule-consistent data via a simulation-to-language pipeline based on the BlueSky simulator.
- A pretrained Qwen-Math-7B is fine-tuned with two parameter-efficient methods: supervised fine-tuning with LoRA and preference-based fine-tuning using LoRA plus GRPO.
- Results from validation datasets and closed-loop simulations show supervised LoRA significantly improves decision accuracy, consistency, and separation performance, including meaningful reductions in near mid-air collision risk.
- Preference-based tuning with GRPO can enhance coordination but shows decreased robustness when interacting with heterogeneous agent policies, indicating trade-offs for real-world deployment.


