Say the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Drones

arXiv cs.RO / 5/6/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a mission-agnostic framework that uses an LLM-based agent to translate natural-language mission goals into real-time UAV swarm actions.
  • It integrates an Agent Core with a Model Context Protocol (MCP) gateway and a Web-of-Drones abstraction built on W3C Web of Things (WoT) standards to provide grounded, structured interactions with drones and sensors.
  • Rather than relying on code generation, the system exposes drones/sensors/services as standardized WoT “Things” to enable continuous state observation and safer actuation through tool-based access.
  • Experiments in ArduPilot simulation across multiple swarm missions and six state-of-the-art LLMs show that general-purpose LLMs often fail at reliable closed-loop execution without explicit grounding and runtime support.
  • Adding task-specific planning tools and runtime guardrails significantly improves robustness, and the study finds that token usage by itself is not a reliable indicator of execution quality.

Abstract

Large Language Models (LLMs) are increasingly explored as high-level reasoning engines for cyber-physical systems, yet their application to real-time UAV swarm management remains challenging due to heterogeneous interfaces, limited grounding, and the need for long-running closed-loop execution. This paper presents a mission-agnostic, agent-enhanced LLM framework for UAV swarm control, where users express mission objectives in natural language and the system autonomously executes them through grounded, real-time interactions. The proposed architecture combines an LLM-based Agent Core with a Model Context Protocol (MCP) gateway and a Web-of-Drones abstraction based on W3C Web of Things (WoT) standards. By exposing drones, sensors, and services as standardized WoT Things, the framework enables structured tool-based interaction, continuous state observation, and safe actuation without relying on code generation. We evaluate the framework using ArduPilot-based simulation across four swarm missions and six state-of-the-art LLMs. Results show that, despite strong reasoning abilities, current general-purpose LLMs still struggle to achieve reliable execution - even for simple swarm tasks - when operating without explicit grounding and execution support. Task-specific planning tools and runtime guardrails substantially improve robustness, while token consumption alone is not indicative of execution quality or reliability.