Spectral bandits for smooth graph functions

arXiv stat.ML / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies a graph-structured multi-armed bandit where each arm’s expected payoff (rating) varies smoothly across the nodes of a graph, enabling online learning with graph priors.
By modeling recommended items as graph nodes whose expected ratings are similar to neighboring nodes, the work targets recommendation tasks such as content-based recommendation.
The authors introduce the concept of “effective dimension” for real-world graphs and design algorithms whose cumulative regret scales well with that dimension rather than with the total number of nodes.
Two algorithms are proposed that achieve linear and sublinear scaling in the effective dimension, aiming for efficient learning in large graphs.
Experiments on real-world content recommendation indicate user-preference estimators for thousands of items can be learned using only tens of node evaluations.

Abstract

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of nodes evaluations.