Rule-VLN: Bridging Perception and Compliance via Semantic Reasoning and Geometric Rectification

arXiv cs.RO / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that embodied AI for Vision-and-Language Navigation (VLN) is shifting from simple reachability to “social compliance,” where agents must follow semantic regulatory constraints rather than only physical feasibility.
  • It introduces Rule-VLN, a new large-scale urban benchmark (29k-node environment) that injects 177 regulatory categories into 8k constrained nodes across four curriculum levels to test fine-grained visual and behavioral compliance.
  • To address agents’ “goal-driven trap” (overemphasis on geometry over rules), the authors propose the Semantic Navigation Rectification Module (SNRM), a universal zero-shot add-on for pre-trained agents.
  • SNRM combines a coarse-to-fine visual perception VLM approach with an epistemic mental map for dynamic detour planning, and experiments show it restores navigation performance by reducing CVR by 19.26% and increasing TC by 5.97%.
  • Overall, Rule-VLN provides a stronger evaluation of rule-compliant navigation while SNRM offers a practical method to improve safety awareness in existing VLN models without retraining from scratch.

Abstract

As embodied AI transitions to real-world deployment, the success of the Vision-and-Language Navigation (VLN) task tends to evolve from mere reachability to social compliance. However, current agents suffer from a "goal-driven trap", prioritizing physical geometry ("can I go?") over semantic rules ("may I go?"), frequently overlooking subtle regulatory constraints. To bridge this gap, we establish Rule-VLN, the first large-scale urban benchmark for rule-compliant navigation. Spanning a massive 29k-node environment, it injects 177 diverse regulatory categories into 8k constrained nodes across four curriculum levels, challenging agents with fine-grained visual and behavioral constraints. We further propose the Semantic Navigation Rectification Module (SNRM), a universal, zero-shot module designed to equip pre-trained agents with safety awareness. SNRM integrates a coarse-to-fine visual perception VLM framework with an epistemic mental map for dynamic detour planning. Experiments demonstrate that while Rule-VLN challenges state-of-the-art models, SNRM significantly restores navigation capabilities, reducing CVR by 19.26% and boosting TC by 5.97%.