RAY-TOLD: Ray-Based Latent Dynamics for Dense Dynamic Obstacle Avoidance with TDMPC

arXiv cs.RO / 5/1/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • RAY-TOLD is a new hybrid navigation framework designed to improve autonomous robot avoidance in dense, dynamic crowds where purely reactive methods can get stuck in local minima.
  • The approach combines ray/LiDAR-based latent dynamics with the long-horizon foresight of reinforcement learning, while still leveraging MPPI’s physics-based robustness.
  • By compressing high-dimensional LiDAR observations into a compact latent state, RAY-TOLD learns a terminal value function and a policy prior to better guide planning.
  • A policy mixture sampling strategy expands MPPI’s candidate trajectories with trajectories sampled from the learned policy, steering robots toward goals while keeping kinematic feasibility.
  • Experiments in a stochastic, high-density dynamic obstacle environment show RAY-TOLD lowers collision rates compared with a standard MPPI baseline, supporting improved reliability and safety.

Abstract

Dense, dynamic crowds pose a persistent challenge for autonomous mobile robots. Purely reactive planning methods, such as Model Predictive Path Integral (MPPI) control, often fail to escape local minima in complex scenarios due to their limited prediction horizon. To bridge this gap, we propose Ray-based Task-Oriented Latent Dynamics (RAY-TOLD), a hybrid control architecture that integrates obstacle information into latent dynamics and utilizes the robustness of physics-based MPPI with the long-horizon foresight of reinforcement learning. RAY-TOLD leverages a LiDAR-centric latent dynamics model to encode high-dimensional sensor data into a compact state representation, enabling the learning of a terminal value function and a policy prior. We introduce a policy mixture sampling strategy that augments the MPPI candidate population with trajectories derived from the learned policy, effectively guiding the planner towards the goal while maintaining kinematic feasibility. Extensive tests in a stochastic environment with high-density dynamic obstacles demonstrate that our method outperforms the MPPI baseline, reducing the collision rate. The results confirm that blending short-horizon physics-based rollouts with learned long-horizon intent significantly enhances navigation reliability and safety.