Rule-VLN: Bridging Perception and Compliance via Semantic Reasoning and Geometric Rectification

arXiv cs.RO / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that embodied AI for Vision-and-Language Navigation (VLN) is shifting from simple reachability to “social compliance,” where agents must follow semantic regulatory constraints rather than only physical feasibility.
It introduces Rule-VLN, a new large-scale urban benchmark (29k-node environment) that injects 177 regulatory categories into 8k constrained nodes across four curriculum levels to test fine-grained visual and behavioral compliance.
To address agents’ “goal-driven trap” (overemphasis on geometry over rules), the authors propose the Semantic Navigation Rectification Module (SNRM), a universal zero-shot add-on for pre-trained agents.
SNRM combines a coarse-to-fine visual perception VLM approach with an epistemic mental map for dynamic detour planning, and experiments show it restores navigation performance by reducing CVR by 19.26% and increasing TC by 5.97%.
Overall, Rule-VLN provides a stronger evaluation of rule-compliant navigation while SNRM offers a practical method to improve safety awareness in existing VLN models without retraining from scratch.

Abstract

As embodied AI transitions to real-world deployment, the success of the Vision-and-Language Navigation (VLN) task tends to evolve from mere reachability to social compliance. However, current agents suffer from a "goal-driven trap", prioritizing physical geometry ("can I go?") over semantic rules ("may I go?"), frequently overlooking subtle regulatory constraints. To bridge this gap, we establish Rule-VLN, the first large-scale urban benchmark for rule-compliant navigation. Spanning a massive 29k-node environment, it injects 177 diverse regulatory categories into 8k constrained nodes across four curriculum levels, challenging agents with fine-grained visual and behavioral constraints. We further propose the Semantic Navigation Rectification Module (SNRM), a universal, zero-shot module designed to equip pre-trained agents with safety awareness. SNRM integrates a coarse-to-fine visual perception VLM framework with an epistemic mental map for dynamic detour planning. Experiments demonstrate that while Rule-VLN challenges state-of-the-art models, SNRM significantly restores navigation capabilities, reducing CVR by 19.26% and boosting TC by 5.97%.

A practical guide to getting comfortable with AI coding tools

Dev.to

Every time a new model comes out, the old one is obsolete of course

Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Dev.to

🚀 Major BrowserAct CLI Update

Dev.to

Rule-VLN: Bridging Perception and Compliance via Semantic Reasoning and Geometric Rectification

Key Points

Abstract

Related Articles

A practical guide to getting comfortable with AI coding tools

Every time a new model comes out, the old one is obsolete of course

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

🚀 Major BrowserAct CLI Update

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer