Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance

arXiv cs.RO / 4/30/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces “Walk with Me,” a map-free framework that turns high-level natural-language intentions into safe, long-horizon, socially compliant robot navigation in open outdoor environments.
  • It uses GPS context plus lightweight candidate points-of-interest from a public map API to ground abstract instructions into concrete destinations and propose coarse waypoint sequences.
  • A high-level vision-language model converts the user’s instructions into specific goals and coarse plans, while an observation-aware mechanism decides whether to rely on the low-level policy or invoke higher-level safety reasoning.
  • Routine navigation segments are handled by a low-level vision-language-action policy, whereas complex, unsafe scenarios (e.g., crowded crossings) trigger explicit reasoning and stop-and-wait behavior.
  • The approach aims to bridge the gap between HD-map-based outdoor systems and learning-based methods that are typically limited to indoor or short-horizon settings.

Abstract

Assisting humans in open-world outdoor environments requires robots to translate high-level natural-language intentions into safe, long-horizon, and socially compliant navigation behavior. Existing map-based methods rely on costly pre-built HD maps, while learning-based policies are mostly limited to indoor and short-horizon settings. To bridge this gap, we propose Walk with Me, a map-free framework for long-horizon social navigation from high-level human instructions. Walk with Me leverages GPS context and lightweight candidate points-of-interest from a public map API for semantic destination grounding and waypoint proposal. A High-Level Vision-Language Model grounds abstract instructions into concrete destinations and plans coarse waypoint sequences. During execution, an observation-aware routing mechanism determines whether the Low-Level Vision-Language-Action policy can handle the current situation or whether explicit safety reasoning from the High-Level VLM is needed. Routine segments are executed by the Low-Level VLA, while complex situations such as crowded crossings trigger high-level reasoning and stop-and-wait behavior when unsafe. By combining semantic intent grounding, map-free long-horizon planning, safety-aware reasoning, and low-level action generation, Walk with Me enables practical outdoor social navigation for human-centric assistance.