Learning Shared Representations for Multi-Task Linear Bandits

arXiv cs.LG / 4/2/2026

📰 News

Key Points

  • The paper studies multi-task linear bandits where T related tasks share a common low-dimensional latent representation, with a shared subspace of dimension r much smaller than d and T.
  • It introduces a new OFUL-style algorithm that uses a two-stage pipeline—an exploration phase, spectral initialization to estimate the shared model, and then OFUL learning using a confidence set built from the low-rank structure.
  • The authors provide theoretical results showing high-probability coverage of the true reward vectors by their constructed confidence set and derive cumulative regret bounds.
  • The proposed method achieves a regret of O(√(drNT)) and is argued to substantially improve over treating each task independently, which yields O(dT√N).
  • Numerical simulations are included to empirically validate performance across different problem sizes.
  • categories: [

Abstract

Multi-task representation learning is an approach that learns shared latent representations across related tasks, facilitating knowledge transfer and improving sample efficiency. This paper introduces a novel approach to multi-task representation learning in linear bandits. We consider a setting with T concurrent linear bandit tasks, each with feature dimension d, that share a common latent representation of dimension r \ll min{d,T}$, capturing their underlying relatedness. We propose a new Optimism in the Face of Uncertainty Linear (OFUL) algorithm that leverages shared low-rank representations to enhance decision-making in a sample-efficient manner. Our algorithm first collects data through an exploration phase, estimates the shared model via spectral initialization, and then conducts OFUL based learning over a newly constructed confidence set. We provide theoretical guarantees for the confidence set and prove that the unknown reward vectors lie within the confidence set with high probability. We derive cumulative regret bounds and show that the proposed approach achieves \tilde{O}(\sqrt{drNT}), a significant improvement over solving the T tasks independently, resulting in a regret of \tilde{O}(dT\sqrt{N}). We performed numerical simulations to validate the performance of our algorithm for different problem sizes.