TrajRAG: Retrieving Geometric-Semantic Experience for Zero-Shot Object Navigation

arXiv cs.CV / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

TrajRAG is a retrieval-augmented generation framework for zero-shot Object Goal Navigation that replaces purely internet-scale commonsense with embodied geometric-semantic experience.
The method incrementally accumulates past navigation episodes into a “lifelong” knowledge base by converting raw observations into a compact topological-polar trajectory representation.
A hierarchical chunking approach groups similar trajectories into unified summaries, enabling coarse-to-fine retrieval of relevant experiences.
During navigation, frontier candidates produce multiple trajectory hypotheses that query TrajRAG for similar past trajectories to improve waypoint selection using large-model reasoning.
Experiments on MP3D, HM3D-v1, and HM3D-v2 indicate that retrieving geometric-semantic experience improves zero-shot ObjectNav performance.

Abstract

Existing zero-shot Object Goal Navigation (ObjectNav) methods often exploit commonsense knowledge from large language or vision-language models to guide navigation. However, such knowledge arises from internet-scale text rather than embodied 3D experience, and episodic observations collected during navigation are typically discarded, preventing the accumulation of lifelong experience. To this end, we propose Trajectory RAG (TrajRAG), a retrieval-augmented generation framework that enhances large-model reasoning by retrieving geometric-semantic experiences. TrajRAG incrementally accumulates episodic observations from past navigation episodes. To structure these observations, we propose a topological-polar (topo-polar) trajectory representation that compactly encodes spatial layouts and semantic contexts, effectively removing redundancies in raw episodic observations. A hierarchical chunking structure further organizes similar topo-polar trajectories into unified summaries, enabling coarse-to-fine retrieval. During navigation, candidate frontiers generate multiple trajectory hypotheses that query TrajRAG for similar past trajectories, guiding large-model reasoning for waypoint selection. New experiences are continually consolidated into TrajRAG, enabling the accumulation of lifelong navigation experience. Experiments on MP3D, HM3D-v1, and HM3D-v2 show that TrajRAG effectively retrieves relevant geometric-semantic experiences and improves zero-shot ObjectNav performance.

The 55.6% problem: why frontier LLMs fail at embedded code

Dev.to

Four CVEs in a week, all the same shape: when agents execute LLM-generated code

Dev.to

Healthcare AI Is Absorbing Institutional Knowledge It Can't Actually Hold

Reddit r/artificial

The Transformer: The Architecture Behind Modern AI

Dev.to

Foundational Models Defining a New Era in Vision: A Survey and Outlook

Dev.to

TrajRAG: Retrieving Geometric-Semantic Experience for Zero-Shot Object Navigation

Key Points

Abstract

Related Articles

The 55.6% problem: why frontier LLMs fail at embedded code

Four CVEs in a week, all the same shape: when agents execute LLM-generated code

Healthcare AI Is Absorbing Institutional Knowledge It Can't Actually Hold

The Transformer: The Architecture Behind Modern AI

Foundational Models Defining a New Era in Vision: A Survey and Outlook

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer