QuadAgent: A Responsive Agent System for Vision-Language Guided Quadrotor Agile Flight

arXiv cs.RO / 4/6/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

QuadAgent is presented as a training-free vision-language-guided agent system designed for agile quadrotor flight, aiming to interpret complex user instructions in real time.
The approach decouples high-level reasoning from low-level control via an asynchronous multi-agent architecture, using Foreground Workflow Agents for active tasks and Background Agents for look-ahead reasoning.
Scene understanding and continuity are supported by an “Impression Graph,” a lightweight topological memory built from sparse keyframes.
Safety during navigation is addressed with a vision-based obstacle avoidance network to enable flight in cluttered indoor environments.
Reported simulation and real-world results indicate improved efficiency and responsiveness, with demonstrations achieving speeds up to 5 m/s.

Abstract

We present QuadAgent, a training-free agent system for agile quadrotor flight guided by vision-language inputs. Unlike prior end-to-end or serial agent approaches, QuadAgent decouples high-level reasoning from low-level control using an asynchronous multi-agent architecture: Foreground Workflow Agents handle active tasks and user commands, while Background Agents perform look-ahead reasoning. The system maintains scene memory via the Impression Graph, a lightweight topological map built from sparse keyframes, and ensures safe flight with a vision-based obstacle avoidance network. Simulation results show that QuadAgent outperforms baseline methods in efficiency and responsiveness. Real-world experiments demonstrate that it can interpret complex instructions, reason about its surroundings, and navigate cluttered indoor spaces at speeds up to 5 m/s.