Dual Pose-Graph Semantic Localization for Vision-Based Autonomous Drone Racing

arXiv cs.RO / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes a dual pose-graph localization approach for vision-based autonomous drone racing, targeting challenges like motion blur, unstable visual features, and single-camera constraints.
  • It fuses odometry with semantic gate detections by using a temporary graph to accumulate multiple observations and then condenses them into refined constraints before promoting them to a persistent main graph for efficient real-time performance.
  • The method is presented as sensor-agnostic, with validation using monocular visual-inertial odometry combined with visual gate detection.
  • Experiments on the TII-RATM dataset report a 56%–74% reduction in ATE versus standalone VIO, and an ablation study shows 10%–12% higher accuracy than a single-graph baseline at the same computational cost.
  • In an A2RL competition deployment, the system enabled real-time onboard localization and reduced odometry drift by up to 4.2 meters per lap during flight.

Abstract

Autonomous drone racing demands robust real-time localization under extreme conditions: high-speed flight, aggressive maneuvers, and payload-constrained platforms that often rely on a single camera for perception. Existing visual SLAM systems, while effective in general scenarios, struggle with motion blur and feature instability inherent to racing dynamics, and do not exploit the structured nature of racing environments. In this work, we present a dual pose-graph architecture that fuses odometry with semantic detections for robust localization. A temporary graph accumulates multiple gate observations between keyframes and optimizes them into a single refined constraint per landmark, which is then promoted to a persistent main graph. This design preserves the information richness of frequent detections while preventing graph growth from degrading real-time performance. The system is designed to be sensor-agnostic, although in this work we validate it using monocular visual-inertial odometry and visual gate detections. Experimental evaluation on the TII-RATM dataset shows a 56% to 74% reduction in ATE compared to standalone VIO, while an ablation study confirms that the dual-graph architecture achieves 10% to 12% higher accuracy than a single-graph baseline at identical computational cost. Deployment in the A2RL competition demonstrated that the system performs real-time onboard localization during flight, reducing the drift of the odometry baseline by up to 4.2 m per lap.