Best of both worlds: Stochastic & adversarial best-arm identification

arXiv stat.ML / 4/17/2026

📰 NewsModels & Research

Key Points

  • The paper studies multi-armed bandit best-arm identification when rewards may be either stochastic or adversarial.
  • While a random uniform strategy achieves the optimal error rate in the fully adversarial setting, it is not optimal under stochastic rewards.
  • The authors show it is impossible in general to design a learner that achieves optimal rates in both settings without knowing which reward model applies.
  • They derive a lower bound describing the best achievable stochastic error rate among strategies that are required to be robust to adversarial rewards.
  • The paper proposes a simple parameter-free algorithm whose stochastic error matches the lower bound up to logarithmic factors and that is also robust to adversarial rewards.

Abstract

We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.

Best of both worlds: Stochastic & adversarial best-arm identification | AI Navigate