Reinforcement Learning with Markov Risk Measures and Multipattern Risk Approximation

arXiv cs.LG / 5/4/2026

📰 NewsModels & Research

共有:

Key Points

The paper studies risk-averse finite-horizon Markov Decision Processes by introducing a new class of Markov coherent risk measures called mini-batch measures.
It defines “multipattern” risk-averse problems that generalize linear-system settings, expanding the scope of structured risk modeling.
The authors combine these ideas into a feature-based Q-learning approach with multipattern Q-factor approximation.
They prove a high-probability regret bound of order O(H^2 N^H sqrt(K)) and propose a more economical variant that simplifies the backward policy evaluation step.
Experiments on a stochastic assignment problem and a short-horizon multi-armed bandit illustrate the theory.

Abstract

For a risk-averse finite-horizon Markov Decision Problem, we introduce a special class of Markov coherent risk measures, called mini-batch measures. We also define the class of multipattern risk-averse problems that generalizes the class of linear systems. We use both concepts in a feature-based

Q

-learning method with multipattern

Q

-factor approximation and we prove a high-probability regret bound of

\mathcal{O}\big(H^2 N^H \sqrt{ K}\big)

, where

H

is the horizon,

N

is the mini-batch size, and

K

is the number of episodes. We also propose an economical version of the

Q

-learning method that streamlines the policy evaluation (backward) step. The theoretical results are illustrated on a stochastic assignment problem and a short-horizon multi-armed bandit problem.

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

The Verge

CLMA Frame Test

Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Dev.to

Roundtable chat with Talkie-1930 and Gemma 4 31B

Reddit r/LocalLLaMA

Reinforcement Learning with Markov Risk Measures and Multipattern Risk Approximation

Key Points

Abstract

Related Articles

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

CLMA Frame Test

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Roundtable chat with Talkie-1930 and Gemma 4 31B

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer