Unveiling the Surprising Efficacy of Navigation Understanding in End-to-End Autonomous Driving

arXiv cs.RO / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that many end-to-end autonomous driving systems underuse global navigation information and over-rely on local scene understanding, leading to weak links between planning performance and navigation inputs.
  • It proposes the Sequential Navigation Guidance (SNG) framework to represent global navigation using real-world navigation patterns, combining path constraints for long-horizon trajectories with turn-by-turn cues for real-time decisions.
  • The authors introduce the SNG-QA dataset (a VQA dataset built on SNG) to better align global navigation cues with local planning.
  • They also present the SNG-VLA model, which fuses local and global planning via navigation modeling and reports state-of-the-art performance without relying on auxiliary perception loss functions.

Abstract

Global navigation information and local scene understanding are two crucial components of autonomous driving systems. However, our experimental results indicate that many end-to-end autonomous driving systems tend to over-rely on local scene understanding while failing to utilize global navigation information. These systems exhibit weak correlation between their planning capabilities and navigation input, and struggle to perform navigation-following in complex scenarios. To overcome this limitation, we propose the Sequential Navigation Guidance (SNG) framework, an efficient representation of global navigation information based on real-world navigation patterns. The SNG encompasses both navigation paths for constraining long-term trajectories and turn-by-turn (TBT) information for real-time decision-making logic. We constructed the SNG-QA dataset, a visual question answering (VQA) dataset based on SNG that aligns global and local planning. Additionally, we introduce an efficient model SNG-VLA that fuses local planning with global planning. The SNG-VLA achieves state-of-the-art performance through precise navigation information modeling without requiring auxiliary loss functions from perception tasks. Project page: SNG-VLA