Physically Native World Models: A Hamiltonian Perspective on Generative World Modeling

arXiv cs.AI / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that the main bottleneck for world models is shifting from realistic future generation to producing physically meaningful, action-controllable, long-horizon-stable predictions for embodied decision-making.
  • It proposes “Hamiltonian World Models,” which encode observations into a structured latent phase space and evolve it using Hamiltonian-inspired dynamics that include control, dissipation, and residual terms.
  • Predicted latent trajectories are decoded into future observations, and the resulting rollouts are intended to be used directly for planning.
  • The authors claim Hamiltonian structure could improve interpretability, data efficiency, and long-horizon stability, while also highlighting key real-world challenges such as friction, contacts, non-conservative forces, and deformable objects.

Abstract

World models have recently re-emerged as a central paradigm for embodied intelligence, robotics, autonomous driving, and model-based reinforcement learning. However, current world model research is often dominated by three partially separated routes: 2D video-generative models that emphasize visual future synthesis, 3D scene-centric models that emphasize spatial reconstruction, and JEPA-like latent models that emphasize abstract predictive representations. While each route has made important progress, they still struggle to provide physically reliable, action-controllable, and long-horizon stable predictions for embodied decision making. In this paper, we argue that the bottleneck of world models is no longer only whether they can generate realistic futures, but whether those futures are physically meaningful and useful for action. We propose \emph{Hamiltonian World Models} as a physically grounded perspective on world modeling. The key idea is to encode observations into a structured latent phase space, evolve the latent state through Hamiltonian-inspired dynamics with control, dissipation, and residual terms, decode the predicted trajectory into future observations, and use the resulting rollouts for planning. We discuss how Hamiltonian structure may improve interpretability, data efficiency, and long-horizon stability, while also noting practical challenges in real-world robotic scenes involving friction, contact, non-conservative forces, and deformable objects.