Spectral bandits

arXiv stat.ML / 4/29/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The paper studies a new bandit setting where each arm’s payoff is a smooth function over an undirected graph, making it well-suited for online graph-based learning such as content recommendation.
  • It models recommending items as selecting graph nodes whose expected ratings are similar to those of neighboring nodes, and aims to minimize cumulative regret versus the optimal policy.
  • To keep performance from growing too quickly with graph size, the authors introduce an “effective dimension,” argued to be small in real-world graphs.
  • They propose three algorithms whose regret scaling depends favorably on this effective dimension, including approaches that scale linearly and sublinearly in it.
  • Experiments on content recommendation suggest that preference estimation over thousands of items can be achieved with only tens of node evaluations.

Abstract

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this work, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node of an undirected graph and its expected rating is similar to the one of its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose three algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of node evaluations.