Robust Parameter Learning for Uncertain MDPs

arXiv cs.LG / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper targets learning and verification of unknown Markov decision processes (MDPs) under transition uncertainty, where existing methods often treat each transition probability’s uncertainty independently.
It introduces parametric MDPs (pMDPs), representing transition probabilities as expressions over shared parameters, so learned uncertainty properly captures algebraic dependencies among transitions.
The authors map (project) uncertainty from observed transition frequencies into the pMDP parameter space to produce a PAC-style uncertainty model for the underlying MDP.
Because solving the resulting induced confidence set is algorithmically difficult, they propose a hierarchy of sound polytopic outer approximations to make the confidence set tractable.
Experiments show the proposed approach yields substantially tighter uncertainty estimates than classical interval-based uncertain MDP learning techniques.

Abstract

Learning-based approaches to verifying unknown Markov decision processes (MDPs) often employ uncertain MDPs. These models use, for example, confidence intervals to capture transition uncertainty and allow synthesis of policies that are robust to this uncertainty. However, this approach typically quantifies uncertainty independently for individual transition probabilities, ignoring dependencies due to shared latent quantities. We propose to learn such models using parametric MDPs (pMDPs), where transition probabilities are expressions over a set of parameters. We project statistical uncertainty from empirical transition frequencies onto the pMDP's parameter space, yielding a probably approximately correct (PAC) uncertainty model for the underlying MDP that respects the algebraic dependencies between transitions. The resulting models are algorithmically challenging to solve, so we propose a hierarchy of sound polytopic outer approximations of the induced confidence set. We implement and evaluate our approach, demonstrating substantially tighter uncertainty estimates than classical interval-based uncertain MDP learning techniques.

The 55.6% problem: why frontier LLMs fail at embedded code

Dev.to

Four CVEs in a week, all the same shape: when agents execute LLM-generated code

Dev.to

Healthcare AI Is Absorbing Institutional Knowledge It Can't Actually Hold

Reddit r/artificial

The Transformer: The Architecture Behind Modern AI

Dev.to

Foundational Models Defining a New Era in Vision: A Survey and Outlook

Dev.to

Robust Parameter Learning for Uncertain MDPs

Key Points

Abstract

Related Articles

The 55.6% problem: why frontier LLMs fail at embedded code

Four CVEs in a week, all the same shape: when agents execute LLM-generated code

Healthcare AI Is Absorbing Institutional Knowledge It Can't Actually Hold

The Transformer: The Architecture Behind Modern AI

Foundational Models Defining a New Era in Vision: A Survey and Outlook

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer