A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems

arXiv cs.LG / 3/18/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

The paper generalizes contextual bandits to handle dense arm features, non-linear reward functions, and time-varying but correlated reward distributions, broadening applicability to tasks like recommendations.
It introduces Conditionally Coupled Contextual C3 Thompson Sampling (C3TS) for Bernoulli bandits by combining an improved Nadaraya-Watson estimator on an embedding space with online Thompson sampling that does not require retraining.
Empirical results show C3TS achieves 5.7% lower average cumulative regret on four OpenML tabular datasets and a 12.4% relative click lift on the Microsoft News Dataset (MIND) compared with competing algorithms.
The approach emphasizes online learning practicality for real-world applications, enabling better performance without offline retraining.

Abstract

Contextual bandits are incredibly useful in many practical problems. We go one step further by devising a more realistic problem that combines: (1) contextual bandits with dense arm features, (2) non-linear reward functions, and (3) a generalization of correlated bandits where reward distributions change over time but the degree of correlation maintains. This formulation lends itself to a wider set of applications such as recommendation tasks. To solve this problem, we introduce conditionally coupled contextual C3 Thompson sampling for Bernoulli bandits. It combines an improved Nadaraya-Watson estimator on an embedding space with Thompson sampling that allows online learning without retraining. Empirical results show that C3 outperforms the next best algorithm by 5.7% lower average cumulative regret on four OpenML tabular datasets as well as demonstrating a 12.4% click lift on Microsoft News Dataset (MIND) compared to other algorithms.

Self-Refining Agents in Spec-Driven Development

Dev.to

How to Optimize Your LinkedIn Profile with AI in 2026 (Get Found by Recruiters)

Dev.to

Agentforce Builder: How to Build AI Agents in Salesforce

Dev.to

How AI Consulting Services Support Staff Development in Dubai

Dev.to

Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs

Dev.to

A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems

Key Points

Abstract

Related Articles

Self-Refining Agents in Spec-Driven Development

How to Optimize Your LinkedIn Profile with AI in 2026 (Get Found by Recruiters)

Agentforce Builder: How to Build AI Agents in Salesforce

How AI Consulting Services Support Staff Development in Dubai

Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer