AI Navigate

Planning in entropy-regularized Markov decision processes and games

arXiv cs.LG / 4/22/2026

📰 News

Read original →

共有:

Key Points

The paper introduces SmoothCruiser, a new planning algorithm for estimating value functions in entropy-regularized Markov decision processes (MDPs) and two-player games using a generative environment model.

Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

Planning in entropy-regularized Markov decision processes and games

Key Points

Abstract

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer