Bayesian Inverse Transition Learning: Learning Dynamics From Near-Optimal Trajectories

arXiv stat.ML / 4/29/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses how to estimate true transition dynamics in offline model-based reinforcement learning using only near-optimal expert trajectories.
  • It proposes “Inverse Transition Learning,” a constraint-based method that treats the limited coverage of expert data as a useful feature rather than a limitation.
  • The approach integrates these constraints into a Bayesian framework, yielding a posterior distribution over the transition dynamics.
  • Experiments in synthetic environments and real healthcare tasks (ICU patient management under hypotension) show improved decision-making and provide guidance on when knowledge transfer is likely to succeed.
  • Overall, the work demonstrates that near-optimal behavior can substantially improve model identification and downstream control reliability in offline settings.

Abstract

We consider the problem of estimating the transition dynamics T^* from near-optimal expert trajectories in the context of offline model-based reinforcement learning. We develop a novel constraint-based method, Inverse Transition Learning, that treats the limited coverage of the expert trajectories as a \emph{feature}: we use the fact that the expert is near-optimal to inform our estimate of T^*. We integrate our constraints into a Bayesian approach. Across both synthetic environments and real healthcare scenarios like Intensive Care Unit (ICU) patient management in hypotension, we demonstrate not only significant improvements in decision-making, but that our posterior can inform when transfer will be successful.