Robust Parameter Learning for Uncertain MDPs

arXiv cs.LG / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets learning and verification of unknown Markov decision processes (MDPs) under transition uncertainty, where existing methods often treat each transition probability’s uncertainty independently.
  • It introduces parametric MDPs (pMDPs), representing transition probabilities as expressions over shared parameters, so learned uncertainty properly captures algebraic dependencies among transitions.
  • The authors map (project) uncertainty from observed transition frequencies into the pMDP parameter space to produce a PAC-style uncertainty model for the underlying MDP.
  • Because solving the resulting induced confidence set is algorithmically difficult, they propose a hierarchy of sound polytopic outer approximations to make the confidence set tractable.
  • Experiments show the proposed approach yields substantially tighter uncertainty estimates than classical interval-based uncertain MDP learning techniques.

Abstract

Learning-based approaches to verifying unknown Markov decision processes (MDPs) often employ uncertain MDPs. These models use, for example, confidence intervals to capture transition uncertainty and allow synthesis of policies that are robust to this uncertainty. However, this approach typically quantifies uncertainty independently for individual transition probabilities, ignoring dependencies due to shared latent quantities. We propose to learn such models using parametric MDPs (pMDPs), where transition probabilities are expressions over a set of parameters. We project statistical uncertainty from empirical transition frequencies onto the pMDP's parameter space, yielding a probably approximately correct (PAC) uncertainty model for the underlying MDP that respects the algebraic dependencies between transitions. The resulting models are algorithmically challenging to solve, so we propose a hierarchy of sound polytopic outer approximations of the induced confidence set. We implement and evaluate our approach, demonstrating substantially tighter uncertainty estimates than classical interval-based uncertain MDP learning techniques.