Best of both worlds: Stochastic & adversarial best-arm identification

arXiv stat.ML / 4/17/2026

📰 NewsModels & Research

共有:

Key Points

The paper studies multi-armed bandit best-arm identification when rewards may be either stochastic or adversarial.
While a random uniform strategy achieves the optimal error rate in the fully adversarial setting, it is not optimal under stochastic rewards.
The authors show it is impossible in general to design a learner that achieves optimal rates in both settings without knowing which reward model applies.
They derive a lower bound describing the best achievable stochastic error rate among strategies that are required to be robust to adversarial rewards.
The paper proposes a simple parameter-free algorithm whose stochastic error matches the lower bound up to logarithmic factors and that is also robust to adversarial rewards.

Abstract

We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.