Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention
arXiv stat.ML / 2026/3/24
💬 オピニオンIdeas & Deep AnalysisModels & Research
要点
- The paper extends the classic multi-armed bandit framework by adding an “abstention” action, letting the agent choose to abstain from accepting stochastic rewards and thereby incur a fixed regret or receive a guaranteed reward.
- It addresses whether computationally efficient algorithms can achieve both asymptotically optimal and minimax-optimal regret under this abstention setting.
- The authors propose and analyze new algorithms whose regret bounds match information-theoretic lower bounds, establishing optimality in a rigorous sense.
- The study provides quantitative insight into how abstention improves performance compared with standard bandits, supported by extensive numerical experiments showing practical benefits alongside the theory.
- The work is framed as groundwork for applying abstention-style options to other online decision-making problems beyond multi-armed bandits.

