A Perturbation Approach to Unconstrained Linear Bandits

arXiv stat.ML / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

論文は、Abernethyらの摂動（perturbation）手法を「制約なし」線形バンディット最適化（uBLO）に再適用し、驚くべきことにuBLOを標準的なオンライン線形最適化（OLO）へ実質的に還元できることを示します。
摂動を比較器（comparator）に適応するOLOアルゴリズムと組み合わせた際、期待レグレットに対する保証を導き、敵対的モデルの違いがコンパレータ適応率へ与える影響を新たに分析します。
動的レグレットについても、区間内の移動量（path-length）P_Tの依存が最適な√P_Tとなり、かつP_Tの事前知識なしでこれを達成できる形で解析を拡張します。
さらに、静的・動的レグレットの双方に関する初の高確率保証、ならびに下界（静的レグレットの下界や、単位ユークリッド球上の敵対的線形バンディットでのΩ(√(dT))）を提示し、理論的に重要な独立結果も含めています。

Abstract

We revisit the standard perturbation-based approach of Abernethy et al. (2008) in the context of unconstrained Bandit Linear Optimization (uBLO). We show the surprising result that in the unconstrained setting, this approach effectively reduces Bandit Linear Optimization (BLO) to a standard Online Linear Optimization (OLO) problem. Our framework improves on prior work in several ways. First, we derive expected-regret guarantees when our perturbation scheme is combined with comparator-adaptive OLO algorithms, leading to new insights about the impact of different adversarial models on the resulting comparator-adaptive rates. We also extend our analysis to dynamic regret, obtaining the optimal

\sqrt{P_T}

path-length dependencies without prior knowledge of

P_T

. We then develop the first high-probability guarantees for both static and dynamic regret in uBLO. Finally, we discuss lower bounds on the static regret, and prove the folklore