Hierarchical Planning with Latent World Models

arXiv cs.LG / 4/6/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes hierarchical planning using latent world models to improve model predictive control for long-horizon embodied tasks, addressing error accumulation and search-space explosion in single-level approaches.
  • It learns latent world models at multiple temporal scales and performs cross-scale planning to enable long-horizon reasoning while reducing inference-time planning complexity.
  • The method is presented as a modular planning abstraction that can work across different latent world-model architectures and application domains.
  • Experiments show zero-shot real-world performance gains on non-greedy robotic tasks, including 70% success on pick-and-place with only a final goal specification versus 0% for a single-level world model.
  • In simulation benchmarks (e.g., push manipulation and maze navigation), the hierarchical approach yields higher success rates and can cut planning-time compute by up to 4x.

Abstract

Model predictive control (MPC) with learned world models has emerged as a promising paradigm for embodied control, particularly for its ability to generalize zero-shot when deployed in new environments. However, learned world models often struggle with long-horizon control due to the accumulation of prediction errors and the exponentially growing search space. In this work, we address these challenges by learning latent world models at multiple temporal scales and performing hierarchical planning across these scales, enabling long-horizon reasoning while substantially reducing inference-time planning complexity. Our approach serves as a modular planning abstraction that applies across diverse latent world-model architectures and domains. We demonstrate that this hierarchical approach enables zero-shot control on real-world non-greedy robotic tasks, achieving a 70% success rate on pick-&-place using only a final goal specification, compared to 0% for a single-level world model. In addition, across physics-based simulated environments including push manipulation and maze navigation, hierarchical planning achieves higher success while requiring up to 4x less planning-time compute.