HCLSM: Hierarchical Causal Latent State Machines for Object-Centric World Modeling
arXiv cs.RO / 4/1/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces HCLSM, an object-centric world modeling architecture that addresses limitations of “flat” latent states by decomposing scenes into slots, modeling temporal dynamics hierarchically, and learning causal structure via interaction graphs.
- HCLSM combines three coordinated components: slot attention with spatial broadcast decoding for objects, a three-level temporal engine (SSMs for continuous physics, sparse transformers for discrete events, and compressed transformers for abstract goals) to avoid collapsing time into one scale.
- It uses graph neural network interaction patterns to infer causal structure, producing learned event boundaries during training and improved next-state prediction.
- Experiments on the PushT robotic manipulation benchmark (Open X-Embodiment) show strong results (0.008 MSE next-state prediction loss) alongside spatial decomposition effectiveness (SBD loss 0.0075).
- The work includes substantial systems engineering, such as a custom Triton kernel for SSM scanning that reportedly achieves a 38× speedup, and provides code with a fairly rigorous test suite.
Related Articles

Black Hat Asia
AI Business

Knowledge Governance For The Agentic Economy.
Dev.to

AI server farms heat up the neighborhood for miles around, paper finds
The Register

Paperclip: Công Cụ Miễn Phí Biến AI Thành Đội Phát Triển Phần Mềm
Dev.to
Does the Claude “leak” actually change anything in practice?
Reddit r/LocalLLaMA