Learning Lifted Action Models from Unsupervised Visual Traces

arXiv cs.AI / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes learning “lifted action models” for AI planning from only sequences of state images, assuming that actions are not directly observed.
  • It introduces a deep learning framework that jointly trains state prediction, action prediction, and the lifted (symbolic/parameterized) action model in one end-to-end setup.
  • To avoid prediction collapse and self-reinforcing errors, the authors add a MILP that finds logically consistent states, actions, and action-model parameters close to the network’s raw predictions.
  • MILP-derived pseudo-labels are then fed back into training, and experiments across multiple domains show improved convergence toward globally consistent solutions.
  • Overall, the work advances unsupervised/weakly supervised learning of action dynamics needed for real-world planning systems.

Abstract

Efficient construction of models capturing the preconditions and effects of actions is essential for applying AI planning in real-world domains. Extensive prior work has explored learning such models from high-level descriptions of state and/or action sequences. In this paper, we tackle a more challenging setting: learning lifted action models from sequences of state images, without action observation. We propose a deep learning framework that jointly learns state prediction, action prediction, and a lifted action model. We also introduce a mixed-integer linear program (MILP) to prevent prediction collapse and self-reinforcing errors among predictions. The MILP takes the predicted states, actions, and action model over a subset of traces and solves for logically consistent states, actions, and action model that are as close as possible to the original predictions. Pseudo-labels extracted from the MILP solution are then used to guide further training. Experiments across multiple domains show that integrating MILP-based correction helps the model escape local optima and converge toward globally consistent solutions.