HAVEN: Hierarchical Adversary-aware Visibility-Enabled Navigation with Cover Utilization using Deep Transformer Q-Networks

arXiv cs.RO / 4/21/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper addresses autonomous navigation in partially observable environments by explicitly reasoning over occlusion and limited fields of view rather than relying only on immediate sensor readings.
It proposes a hierarchical framework that uses a Deep Transformer Q-Network (DTQN) to select high-level subgoals from short task-aware histories and a modular low-level controller to execute the chosen waypoints.
The DTQN’s candidate subgoal generation is made visibility-aware through masking and exposure penalties that encourage using cover and anticipating safety.
The low-level component uses a potential-field controller to track the selected subgoal while performing smooth short-horizon obstacle avoidance.
Experiments in 2D simulation and a 3D Unity-ROS setup (via point-cloud projection into the same feature schema) show improved success rate, safety margins, and time-to-goal versus classical planners and RL baselines, with ablations supporting the value of temporal memory and visibility-aware design.

Abstract

Autonomous navigation in partially observable environments requires agents to reason beyond immediate sensor input, exploit occlusion, and ensure safety while progressing toward a goal. These challenges arise in many robotics domains, from urban driving and warehouse automation to defense and surveillance. Classical path planning approaches and memoryless reinforcement learning often fail under limited fields of view (FoVs) and occlusions, committing to unsafe or inefficient maneuvers. We propose a hierarchical navigation framework that integrates a Deep Transformer Q-Network (DTQN) as a high-level subgoal selector with a modular low-level controller for waypoint execution. The DTQN consumes short histories of task-aware features, encoding odometry, goal direction, obstacle proximity, and visibility cues, and outputs Q-values to rank candidate subgoals. Visibility-aware candidate generation introduces masking and exposure penalties, rewarding the use of cover and anticipatory safety. A low-level potential field controller then tracks the selected subgoal, ensuring smooth short-horizon obstacle avoidance. We validate our approach in 2D simulation and extend it directly to a 3D Unity-ROS environment by projecting point-cloud perception into the same feature schema, enabling transfer without architectural changes. Results show consistent improvements over classical planners and RL baselines in success rate, safety margins, and time to goal, with ablations confirming the value of temporal memory and visibility-aware candidate design. These findings highlight a generalizable framework for safe navigation under uncertainty, with broad relevance across robotic platforms.