Best-of-Both-Worlds Multi-Dueling Bandits: Unified Algorithms for Stochastic and Adversarial Preferences under Condorcet and Borda Objectives

arXiv cs.LG / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper presents the first best-of-both-worlds algorithms for multi-dueling bandits under both Condorcet and Borda objectives, performing optimally in stochastic and adversarial environments without knowing the regime in advance.
For Condorcet, it introduces MetaDueling, a black-box reduction that transforms any dueling bandit algorithm into a multi-dueling bandit algorithm by converting multi-way winner feedback into an unbiased pairwise signal.
Instantiating the MetaDueling reduction with Versatile-DB yields the first best-of-both-worlds algorithm for Condorcet multi-dueling with adversarial regret O(sqrt(KT)) and instance-optimal stochastic regret O(sum_{i != a*} log T / Δ_i).
For the Borda setting, the paper proposes AlgBorda, achieving stochastic regret O(K^2 log KT + K log^2 T + sum_{i: Δ_i^B > 0} (K log KT)/(Δ_i^B)^2) and adversarial regret O(K sqrt(T log KT) + K^{1/3} T^{2/3} (log K)^{1/3}) without regime knowledge.
The work provides matching lower bounds for Condorcet and near-optimal upper bounds for Borda, aligning with or improving the best-known results in the literature.

Abstract

Multi-dueling bandits, where a learner selects

m \geq 2

arms per round and observes only the winner, arise naturally in many applications including ranking and recommendation systems, yet a fundamental question has remained open: can a single algorithm perform optimally in both stochastic and adversarial environments, without knowing which regime it faces? We answer this affirmatively, providing the first best-of-both-worlds algorithms for multi-dueling bandits under both Condorcet and Borda objectives. For the Condorcet setting, we propose \texttt{MetaDueling}, a black-box reduction that converts any dueling bandit algorithm into a multi-dueling bandit algorithm by transforming multi-way winner feedback into an unbiased pairwise signal. Instantiating our reduction with \texttt{Versatile-DB} yields the first best-of-both-worlds algorithm for multi-dueling bandits: it achieves

O(\sqrt{KT})

pseudo-regret against adversarial preferences and the instance-optimal

O\!\left(\sum_{i \neq a^\star} \frac{\log T}{\Delta_i}\right)

pseudo-regret under stochastic preferences, both simultaneously and without prior knowledge of the regime. For the Borda setting, we propose \AlgBorda, a stochastic-and-adversarial algorithm that achieves

O\left(K^2 \log KT + K \log^2 T + \sum_{i: \Delta_i^{\mathrm{B}} > 0} \frac{K\log KT}{(\Delta_i^{\mathrm{B}})^2}\right)

regret in stochastic environments and

O\left(K \sqrt{T \log KT} + K^{1/3} T^{2/3} (\log K)^{1/3}\right)

regret against adversaries, again without prior knowledge of the regime. We complement our upper bounds with matching lower bounds for the Condorcet setting. For the Borda setting, our upper bounds are near-optimal with respect to the lower bounds (within a factor of

K

) and match the best-known results in the literature.

Attacks On Data Centers, Qwen3.5 In All Sizes, DeepSeek’s Huawei Play, Apple’s Multimodal Tokenizer

The Batch

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".

Dev.to

Lessons from Academic Plagiarism Tools for SaaS Product Development

Dev.to

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems

Dev.to

KI in der amtlichen Recherche beim DPMA: Was Patentanwälte bei Neuanmeldungen jetzt beachten sollten (Stand: März 2026)

Dev.to

Best-of-Both-Worlds Multi-Dueling Bandits: Unified Algorithms for Stochastic and Adversarial Preferences under Condorcet and Borda Objectives

Key Points

Abstract

Related Articles

Attacks On Data Centers, Qwen3.5 In All Sizes, DeepSeek’s Huawei Play, Apple’s Multimodal Tokenizer

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".

Lessons from Academic Plagiarism Tools for SaaS Product Development

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems

KI in der amtlichen Recherche beim DPMA: Was Patentanwälte bei Neuanmeldungen jetzt beachten sollten (Stand: März 2026)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

Attacks On Data Centers, Qwen3.5 In All Sizes, DeepSeek’s Huawei Play, Apple’s Multimodal Tokenizer

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".

Lessons from Academic Plagiarism Tools for SaaS Product Development

**Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems**

KI in der amtlichen Recherche beim DPMA: Was Patentanwälte bei Neuanmeldungen jetzt beachten sollten (Stand: März 2026)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Core Allocation Optimization for Energy‑Efficient Multi‑Core Scheduling in ARINC650 Systems