Event-Centric World Modeling with Memory-Augmented Retrieval for Embodied Decision-Making

arXiv cs.RO / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes an event-centric world modeling framework for embodied agents that represents the environment as structured semantic events encoded into a permutation-invariant latent representation.
  • Instead of end-to-end policies, decision-making is performed via memory-augmented retrieval from a knowledge bank of prior event-to-maneuver experiences, producing actions as a weighted combination of retrieved solutions.
  • The approach is designed to be more interpretable than typical learning-based methods by linking current decisions explicitly to stored cases (case-based reasoning).
  • By incorporating physics-informed constraints into the retrieval process, the framework aims to select maneuvers consistent with observed system dynamics.
  • Experiments in UAV flight scenarios indicate the method can meet real-time control constraints while producing interpretable and physically consistent behavior.

Abstract

Autonomous agents operating in dynamic and safety-critical environments require decision-making frameworks that are both computationally efficient and physically grounded. However, many existing approaches rely on end-to-end learning, which often lacks interpretability and explicit mechanisms for ensuring consistency with physical constraints. In this work, we propose an event-centric world modeling framework with memory-augmented retrieval for embodied decision-making. The framework represents the environment as a structured set of semantic events, which are encoded into a permutation-invariant latent representation. Decision-making is performed via retrieval over a knowledge bank of prior experiences, where each entry associates an event representation with a corresponding maneuver. The final action is computed as a weighted combination of retrieved solutions, providing a transparent link between decision and stored experiences. The proposed design enables structured abstraction of dynamic environments and supports interpretable decision-making through case-based reasoning. In addition, incorporating physics-informed knowledge into the retrieval process encourages the selection of maneuvers that are consistent with observed system dynamics. Experimental evaluation in UAV flight scenarios demonstrates that the framework operates within real-time control constraints while maintaining interpretable and consistent behavior.