AI Navigate

A Further Efficient Algorithm with Best-of-Both-Worlds Guarantees for $m$-Set Semi-Bandit Problem

arXiv cs.LG / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper extends FTPL with geometric resampling to m-set semi-bandits, establishing an optimal adversarial regret of O(sqrt(mdT)).
  • It demonstrates Best-of-Both-Worlds performance by achieving logarithmic regret in the stochastic setting for the same problem and distributions (Fréchet and Pareto).
  • It introduces a conditional geometric resampling extension that reduces loss-estimation complexity from O(d^2) to O(md(log(d/m)+1)) without sacrificing regret performance.
  • The results rely on specific distributional choices and parameter settings, highlighting both theoretical optimality and practical computational gains.

Abstract

This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in m-set semi-bandit problems. FTPL has been studied extensively as a promising candidate of an efficient algorithm with favorable regret for adversarial combinatorial semi-bandits. Nevertheless, the optimality of FTPL has still been unknown unlike Follow-the-Regularized-Leader (FTRL) whose optimality has been proved for various tasks of online learning. In this paper, we extend the analysis of FTPL with geometric resampling (GR) to m-set semi-bandits, which is a special case of combinatorial semi-bandits, showing that FTPL with Fr\'{e}chet and Pareto distributions with certain parameters achieves the best possible regret of O(\sqrt{mdT}) in adversarial setting. We also show that FTPL with Fr\'{e}chet and Pareto distributions with a certain parameter achieves a logarithmic regret for stochastic setting, meaning the Best-of-Both-Worlds optimality of FTPL for m-set semi-bandit problems. Furthermore, we extend the conditional geometric resampling to m-set semi-bandits for efficient loss estimation in FTPL, reducing the computational complexity from O(d^2) of the original geometric resampling to O(md(\log(d/m)+1)) without sacrificing the regret performance.