Beyond Specialization: Robust Reinforcement Learning Navigation via Procedural Map Generators

arXiv cs.RO / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The study addresses a key limitation of deep reinforcement learning (DRL) navigation—overfitting to the limited structure of manually designed training environments—by using procedurally generated maps with guaranteed navigability.
The authors built MuRoSim, integrating four procedural map generator types (sparse, maze, graph, and Wave Function Collapse) and systematically tested five navigation policies via cross-generator transfer over many seeded maps.
Cross-generator transfer proved highly asymmetric: a policy specialized to sparse layouts dropped to 3.3% success on maze maps, while training on a combined generator set produced much stronger generalization (about 91.5% mean success).
Robustness was driven mainly by A* path-planner subgoal inputs: success rose from a feedforward baseline (~90.2%) to ~98.9%, outperforming GRU recurrence, which offered limited gains beyond reactive performance.
In comparison to a classical Carrot+A* controller and in RoboMaster real-world tests, learned DRL policies showed clear advantages—especially speed adaptation—with performance remaining high at higher speeds where classical control degraded sharply.

Abstract

Deep reinforcement learning (DRL) navigation policies often overfit to the structure of their training environments, as environmental diversity is typically constrained by the manual effort required to design diverse scenarios. While procedural map generation offers scalable diversity, no prior work systematically compares how different generator types affect policy generalization. We integrate four generators (sparse, maze, graph, and Wave Function Collapse) with guaranteed navigability into MuRoSim, a 2D simulator focusing on training efficiency for LiDAR-based navigation. We cross-evaluate five navigation policies on 1000 seeded maps per generator across three training seeds. Results show a strongly asymmetric cross-generator transfer: a specialist trained on sparse layouts falls to 3.3% success on mazes, whereas a policy trained on the combined generator set achieves 91.5 +/- 1.1% mean success. We further demonstrate that A* path-planner subgoal inputs are the dominant factor for robustness, raising success from the 90.2 +/- 1.4% feedforward baseline to 98.9 +/- 0.4% and outperforming GRU recurrence, which only improves the reactive baseline. The DRL policies outperform a classical Carrot+A* controller, which matches their success only at low speeds (1.0 m/s) but collapses to 24.9% at 2.0 m/s. This highlights learned speed adaptation as the decisive advantage of the learned approach. Real-world experiments on a RoboMaster confirm sim-to-real transfer in a cluttered arena, while a maze-like layout exposes remaining failure modes that recurrence helps mitigate.

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Dev.to

Struggling with Qwen3.6 27B / 35B locally (3090) slow responses, breaking code looking for better setup + auto model switching

Reddit r/LocalLLaMA

Last Week in AI #340 - OpenAI vs Musk + Microsoft, DeepSeek v4, Vision Banana

Last Week in AI

Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!

Reddit r/LocalLLaMA

Uber Shares What Happens When 1.500 AI Agents Hit Production

Reddit r/artificial

Beyond Specialization: Robust Reinforcement Learning Navigation via Procedural Map Generators

Key Points

Abstract

Related Articles

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF

Struggling with Qwen3.6 27B / 35B locally (3090) slow responses, breaking code looking for better setup + auto model switching

Last Week in AI #340 - OpenAI vs Musk + Microsoft, DeepSeek v4, Vision Banana

Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!

Uber Shares What Happens When 1.500 AI Agents Hit Production

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer