Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention

arXiv stat.ML / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper extends the classic multi-armed bandit framework by adding an “abstention” action, letting the agent choose to abstain from accepting stochastic rewards and thereby incur a fixed regret or receive a guaranteed reward.
It addresses whether computationally efficient algorithms can achieve both asymptotically optimal and minimax-optimal regret under this abstention setting.
The authors propose and analyze new algorithms whose regret bounds match information-theoretic lower bounds, establishing optimality in a rigorous sense.
The study provides quantitative insight into how abstention improves performance compared with standard bandits, supported by extensive numerical experiments showing practical benefits alongside the theory.
The work is framed as groundwork for applying abstention-style options to other online decision-making problems beyond multi-armed bandits.

Abstract

We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic innovation: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the stochastic instantaneous reward before observing it. When opting for abstention, the agent either suffers a fixed regret or gains a guaranteed reward. This added layer of complexity naturally prompts the key question: can we develop algorithms that are both computationally efficient and asymptotically and minimax optimal in this setting? We answer this question in the affirmative by designing and analyzing algorithms whose regrets meet their corresponding information-theoretic lower bounds. Our results offer valuable quantitative insights into the benefits of the abstention option, laying the groundwork for further exploration in other online decision-making problems with such an option. Extensive numerical experiments validate our theoretical results, demonstrating that our approach not only advances theory but also has the potential to deliver significant practical benefits.

Interactive Web Visualization of GPT-2

Reddit r/artificial

Stop Treating AI Interview Fraud Like a Proctoring Problem

Dev.to

[R] Causal self-attention as a probabilistic model over embeddings

Reddit r/MachineLearning

The 5 software development trends that actually matter in 2026 (and what they mean for your startup)

Dev.to

InVideo AI Review: Fast Finished

Dev.to

Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention

Key Points

Abstract

Related Articles

Interactive Web Visualization of GPT-2

Stop Treating AI Interview Fraud Like a Proctoring Problem

[R] Causal self-attention as a probabilistic model over embeddings

The 5 software development trends that actually matter in 2026 (and what they mean for your startup)

InVideo AI Review: Fast Finished

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer