Sequential 1-bit Mean Estimation with Near-Optimal Sample Complexity

arXiv stat.ML / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses distributed mean estimation under a strict 1-bit communication constraint, using randomized, sequential interval queries whose single-bit answers indicate whether samples fall within queried ranges.
It proves PAC guarantees for distributions with bounded mean and variance and derives a near-minimax sample complexity bound of ~O((σ²/ε²)·log(1/δ) + log(λ/σ)).
The authors show the derived rate essentially matches the unquantized (real-valued) minimax benchmark up to logarithmic factors, and they argue the additional log(λ/σ) term is unavoidable.
The work establishes an adaptivity gap, demonstrating that adaptive interval-query estimators can substantially outperform the best non-adaptive estimators when λ/σ is large.
The paper further provides tightened bounds for lighter-tailed distributions and multiple algorithm variants for unknown budgets, unknown variance within bounds, and a reduced (two-stage) adaptivity setting using more complex queries.

Abstract

In this paper, we study the problem of distributed mean estimation with 1-bit communication constraints. We propose a mean estimator that is based on (randomized and sequentially-chosen) interval queries, whose 1-bit outcome indicates whether the given sample lies in the specified interval. Our estimator is

(\epsilon, \delta)

-PAC for all distributions with bounded mean (

-\lambda \le \mathbb{E}(X) \le \lambda

) and variance (

\mathrm{Var}(X) \le \sigma^2

) for some known parameters

\lambda

and

\sigma

. We derive a sample complexity bound

\widetilde{O}\big( \frac{\sigma^2}{\epsilon^2}\log\frac{1}{\delta} + \log\frac{\lambda}{\sigma}\big)

, which matches the minimax lower bound for the unquantized setting up to logarithmic factors and the additional

\log\frac{\lambda}{\sigma}

term that we show to be unavoidable. We also establish an adaptivity gap for interval-query based estimators: the best non-adaptive mean estimator is considerably worse than our adaptive mean estimator for large

\frac{\lambda}{\sigma}

. Finally, we give tightened sample complexity bounds for distributions with stronger tail decay, and present additional variants that (i) handle an unknown sampling budget (ii) adapt to the unknown true variance given (possibly loose) upper and lower bounds on the variance, and (iii) use only two stages of adaptivity at the expense of more complicated (non-interval) queries.