Nearly Optimal Best Arm Identification for Semiparametric Bandits

arXiv stat.ML / 4/7/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies fixed-confidence best arm identification (BAI) in semiparametric bandits where rewards are linear in arm features plus an unknown additive baseline shift, distinguishing it from standard linear-bandit BAI.
  • For the transductive case, it proves an attainable instance-dependent lower bound that matches the linear-bandit complexity computed on shifted features.
  • It introduces a computationally efficient phase-elimination algorithm using a new $XY$-design to enable orthogonalized regression in this semiparametric setting.
  • The authors derive a nearly optimal high-probability upper bound on sample complexity, with performance matching the lower bound up to logarithmic factors and an additive $d^2$ term.
  • Experiments on synthetic data and the Jester dataset report clear improvements over prior baselines.

Abstract

We study fixed-confidence Best Arm Identification (BAI) in semiparametric bandits, where rewards are linear in arm features plus an unknown additive baseline shift. Unlike linear-bandit BAI, this setting requires orthogonalized regression, and its instance-optimal sample complexity has remained open. For the transductive setting, we establish an attainable instance-dependent lower bound characterized by the corresponding linear-bandit complexity on shifted features. We then propose a computationally efficient phase-elimination algorithm based on a new XY-design for orthogonalized regression. Our analysis yields a nearly optimal high-probability sample-complexity upper bound, up to log factors and an additive d^2 term, and experiments on synthetic instances and the Jester dataset show clear gains over prior baselines.