Efficient Best-of-Both-Worlds Algorithms for Contextual Combinatorial Semi-Bandits

arXiv stat.ML / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes the first “best-of-both-worlds” algorithm for contextual combinatorial semi-bandits, achieving near-optimal o(sqrt(T)) regret in adversarial settings and near-logarithmic o(log T) regret in corrupted stochastic settings simultaneously.
It builds on a Follow-the-Regularized-Leader (FTRL) framework using a Shannon entropy regularizer to provide a flexible approach with efficient algorithmic structure.
A key contribution is addressing the computational bottleneck in FTRL/online stochastic mirror descent: the high-dimensional projection step per round is reformulated using Karush-Kuhn-Tucker conditions.
The authors reduce the projection to a single-variable root-finding problem, enabling dramatically faster per-round computations.
Experiments show the method retains the desired regret guarantees while delivering substantial real-time, large-scale speed-ups, supporting practical deployment potential.

Abstract

We introduce the first best-of-both-worlds algorithm for contextual combinatorial semi-bandits that simultaneously guarantees

\widetilde{\mathcal{O}}(\sqrt{T})

regret in the adversarial regime and

\widetilde{\mathcal{O}}(\ln T)

regret in the corrupted stochastic regime. Our approach builds on the Follow-the-Regularized-Leader (FTRL) framework equipped with a Shannon entropy regularizer, yielding a flexible method that admits efficient implementations. Beyond regret bounds, we tackle the practical bottleneck in FTRL (or, equivalently, Online Stochastic Mirror Descent) arising from the high-dimensional projection step encountered in each round of interaction. By leveraging the Karush-Kuhn-Tucker conditions, we transform the

K

-dimensional convex projection problem into a single-variable root-finding problem, dramatically accelerating each round. Empirical evaluations demonstrate that this combined strategy not only attains the attractive regret bounds of best-of-both-worlds algorithms but also delivers substantial per-round speed-ups, making it well-suited for large-scale, real-time applications.