SAGA: A Robust Self-Attention and Goal-Aware Anchor-based Planner for Safe UAV Autonomous Navigation

arXiv cs.RO / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • SAGA is an anchor-based UAV planning method that frames local planning as a one-stage joint regression-and-ranking problem over a fixed lattice of motion anchors.
  • It uses robust self-attention to perform cross-anchor global reasoning by converting anchor-aligned features into geometry-aware tokens, including polar positional encoding derived from anchor yaw and pitch.
  • A goal-aware modulation module injects velocity, acceleration, and target information into the token representation to improve score prediction for candidate motions.
  • Experiments in cluttered pillar-map environments up to 4.0 m/s show SAGA achieves a 100% success rate and significantly outperforms YOPO, Ego-planner, and Fast-planner in both success and safety metrics.
  • The ablation comparing SAGA without polar positional encoding indicates that PPE is critical for stable reasoning across anchors and for selecting safe trajectories through cluttered scenes.

Abstract

Agile unmanned aerial vehicle (UAV) navigation in cluttered environments demands a planning architecture that is both computationally efficient and structurally expressive enough to reason over multiple feasible motions. This paper presents SAGA, a robust self-attention and goal-aware anchor-based planner for safe UAV autonomous navigation. SAGA formulates local planning as a one-stage joint regression-and-ranking problem over a fixed lattice of motion anchors. Given a depth image and a body-frame motion state, the planner predicts refined terminal states and planning scores for all anchors in a single forward pass, after which the best candidate is decoded into a dynamically feasible trajectory. The key idea of SAGA is to transform anchor-aligned features into geometry-aware tokens and perform cross-anchor global reasoning with self-attention. To preserve directional structure in the token space, we further introduce a polar positional encoding derived from anchor yaw and pitch. In addition, a goal-aware modulation module injects velocity, acceleration, and target information into the token representation before final score prediction. Experiments in cluttered pillar-map environments under maximum speed settings of 2.0, 3.0, and 4.0~m/s show that SAGA consistently achieves a 100\% success rate, while YOPO drops from 90.91\% to 62.50\%, Ego-planner from 71.43\% to 52.63\%, and Fast-planner from 52.63\% to 38.46\%. Under the 4.0~m/s maximum speed setting, SAGA also improves average safety from 1.9843~m to 2.3888~m and minimum safety from 0.4390~m to 0.7576~m over YOPO, while reducing total flight time from 40.4631~s to 27.4901~s. The comparison with SAGA w/o PPE further shows that explicit polar positional encoding is critical for stable cross-anchor reasoning and safe passage selection in cluttered scenes.