AI Navigate

SR-Nav: Spatial Relationships Matter for Zero-shot Object Goal Navigation

arXiv cs.CV / 3/20/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • SR-Nav introduces a Spatial Relation-aware framework for zero-shot object-goal navigation that leverages both observed and experience-based spatial relationships to improve perception and planning with foundation models.
  • It builds a Dynamic Spatial Relationship Graph (DSRG) that encodes target-centered spatial relations and updates it in real time as observations change.
  • A Relation-aware Matching Module uses diverse relationships from the DSRG to verify and correct detections, enhancing robustness of visual perception.
  • A Dynamic Relationship Planning Module reduces the planning search space by computing optimal paths from the DSRG, guiding planning and lowering exploration redundancy.
  • Experimental results on HM3D report state-of-the-art performance in both success rate and navigation efficiency, and the code will be publicly available on GitHub at the provided link.

Abstract

Zero-shot object-goal navigation aims to find target objects in unseen environments using only egocentric observation. Recent methods leverage foundation models' comprehension and reasoning capabilities to enhance navigation performance. However, when faced with poor viewpoints or weak semantic cues, foundation models often fail to support reliable reasoning in both perception and planning, resulting in inefficient or failed navigation. We observe that inherent relationships among objects and regions encode structured scene priors, which help agents infer plausible target locations even under partial observations. Motivated by this insight, we propose Spatial Relation-aware Navigation (SR-Nav), a framework that models both observed and experience-based spatial relationships to enhance both perception and planning. Specifically, SR-Nav first constructs a Dynamic Spatial Relationship Graph (DSRG) that encodes the target-centered spatial relationships through the foundation models and updates dynamically with real-time observations. We then introduce a Relation-aware Matching Module. It utilizes relationship matching instead of naive detection, leveraging diverse relationships in the DSRG to verify and correct errors, enhancing visual perception robustness. Finally, we design a Dynamic Relationship Planning Module to reduce the planning search space by dynamically computing the optimal paths based on the DSRG from the current position, thereby guiding planning and reducing exploration redundancy. Experiments on HM3D show that our method achieves state-of-the-art performance in both success rate and navigation efficiency. The code will be publicly available at https://github.com/Mzyw-1314/SR-Nav