A Further Efficient Algorithm with Best-of-Both-Worlds Guarantees for $m$-Set Semi-Bandit Problem

arXiv cs.LG / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper extends FTPL with geometric resampling to m-set semi-bandits, establishing an optimal adversarial regret of O(sqrt(mdT)).
It demonstrates Best-of-Both-Worlds performance by achieving logarithmic regret in the stochastic setting for the same problem and distributions (Fréchet and Pareto).
It introduces a conditional geometric resampling extension that reduces loss-estimation complexity from O(d^2) to O(md(log(d/m)+1)) without sacrificing regret performance.
The results rely on specific distributional choices and parameter settings, highlighting both theoretical optimality and practical computational gains.

Abstract

This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in

m

-set semi-bandit problems. FTPL has been studied extensively as a promising candidate of an efficient algorithm with favorable regret for adversarial combinatorial semi-bandits. Nevertheless, the optimality of FTPL has still been unknown unlike Follow-the-Regularized-Leader (FTRL) whose optimality has been proved for various tasks of online learning. In this paper, we extend the analysis of FTPL with geometric resampling (GR) to

m

-set semi-bandits, which is a special case of combinatorial semi-bandits, showing that FTPL with Fr\'{e}chet and Pareto distributions with certain parameters achieves the best possible regret of

O(\sqrt{mdT})

in adversarial setting. We also show that FTPL with Fr\'{e}chet and Pareto distributions with a certain parameter achieves a logarithmic regret for stochastic setting, meaning the Best-of-Both-Worlds optimality of FTPL for

m

-set semi-bandit problems. Furthermore, we extend the conditional geometric resampling to

m

-set semi-bandits for efficient loss estimation in FTPL, reducing the computational complexity from

O(d^2)

of the original geometric resampling to

O(md(\log(d/m)+1))

without sacrificing the regret performance.

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

Dev.to

The Research That Doesn't Exist

Dev.to

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI

TechCrunch

Krish Naik: AI Learning Path For 2026- Data Science, Generative and Agentic AI Roadmap

Dev.to

A Further Efficient Algorithm with Best-of-Both-Worlds Guarantees for $m$-Set Semi-Bandit Problem

Key Points

Abstract

Related Articles

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

The Research That Doesn't Exist

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI

Krish Naik: AI Learning Path For 2026- Data Science, Generative and Agentic AI Roadmap

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer