Differentially Private Best-Arm Identification

arXiv stat.ML / 4/9/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies best-arm identification (BAI) under differential privacy, focusing on both the local model (1-local DP) and central model (1-global DP) with fixed confidence.
  • It derives lower bounds on the sample complexity of any 0-correct BAI algorithm that satisfies 1-global or 1-local differential privacy, identifying distinct “high-privacy” and “low-privacy” regimes.
  • In the high-privacy regime, the hardness depends on a coupled effect of privacy and information-theoretic quantities tied to total variation distance, while in the low-privacy regime the bounds match non-private hardness.
  • The authors propose new 1-local and 1-global DP variants of a Top Two algorithm (CTB-TT and AdaP-TT*), with CTB-TT using a Randomised Response-based private mean estimator.
  • For 1-global DP, they use an arm-dependent adaptive episodic mean estimator that adds Laplace noise, and prove that AdaP-TT* can achieve the asymptotic lower bound up to constant factors by adapting transportation costs.

Abstract

Best Arm Identification (BAI) problems are progressively used for data-sensitive applications, such as designing adaptive clinical trials, tuning hyper-parameters, and conducting user studies. Motivated by the data privacy concerns invoked by these applications, we study the problem of BAI with fixed confidence in both the local and central models, i.e. \epsilon-local and \epsilon-global Differential Privacy (DP). First, to quantify the cost of privacy, we derive lower bounds on the sample complexity of any \delta-correct BAI algorithm satisfying \epsilon-global DP or \epsilon-local DP. Our lower bounds suggest the existence of two privacy regimes. In the high-privacy regime, the hardness depends on a coupled effect of privacy and novel information-theoretic quantities involving the Total Variation. In the low-privacy regime, the lower bounds reduce to the non-private lower bounds. We propose \epsilon-local DP and \epsilon-global DP variants of a Top Two algorithm, namely CTB-TT and AdaP-TT*, respectively. For \epsilon-local DP, CTB-TT is asymptotically optimal by plugging in a private estimator of the means based on Randomised Response. For \epsilon-global DP, our private estimator of the mean runs in arm-dependent adaptive episodes and adds Laplace noise to ensure a good privacy-utility trade-off. By adapting the transportation costs, the expected sample complexity of AdaP-TT* reaches the asymptotic lower bound up to multiplicative constants.