Covariance-adapting algorithm for semi-bandits with application to sparse rewards

Key Points

The paper studies stochastic combinatorial semi-bandits where the joint outcome distribution determines problem complexity, unlike standard bandits.

Abstract

We investigate stochastic combinatorial semi-bandits, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed sub-Gaussian family. We alleviate this issue by instead considering a new general family of sub-exponential distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the expected regret on this family, that is parameterized by the unknown covariance matrix of outcomes, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems.

Covariance-adapting algorithm for semi-bandits with application to sparse rewards

Key Points

Abstract

Related Articles

Training Qwen2.5-0.5B-Instruct on Reddit posts summarization tasks with length constraint on my 3xMac Minis with GRPO - evals update

[D] Released a 100k-sample dataset on Hugging Face

Vibe Coding Just Graduated From Joke to Job Title

512,000 Lines of Leaked Code Exposed Anthropic's Secret Models

Claude Code's Security Defaults: What It Ships When You Don't Ask

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Related Articles

Training Qwen2.5-0.5B-Instruct on Reddit posts summarization tasks with length constraint on my 3xMac Minis with GRPO - evals update
Reddit r/LocalLLaMA

[D] Released a 100k-sample dataset on Hugging Face
Reddit r/LocalLLaMA

Vibe Coding Just Graduated From Joke to Job Title
Dev.to

512,000 Lines of Leaked Code Exposed Anthropic's Secret Models
Dev.to

Claude Code's Security Defaults: What It Ships When You Don't Ask
Dev.to