AI Navigate

Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

arXiv cs.LG / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • It studies stochastic multi-armed bandits under simultaneous space (W bits of memory) and adaptivity (B batches) constraints, showing this regime breaks the mild results known when each constraint is considered separately.
  • It proves a lower bound: any algorithm with W-bit memory must use at least Ω(K/W) batches to achieve near-minimax regret ~Ō(√(KT)), even when using adaptive grids.
  • It shows that logarithmic memory rules cannot achieve K-independent batch complexity under these combined constraints.
  • It introduces an information bottleneck argument showing near-minimax regret requires Ω(K) bits of information about the good-arm set, while the B×W information budget limits achievable information to O(BW).
  • It provides an algorithm that uses O(log T) bits of memory and ~Ō(K) batches to achieve regret ~Ō(√(KT)), nearly matching the derived lower bound.

Abstract

We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in B batches and has only W bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret \widetilde{O}(\sqrt{KT}) is achievable with O(\log T) bits of memory under fully adaptive interaction, and with a K-independent O(\log\log T)-type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a W-bit memory constraint must use at least \Omega(K/W) batches to achieve near-minimax regret \widetilde{O}(\sqrt{KT}) , even under adaptive grids. In particular, logarithmic memory rules out K-independent batch complexity. Our proof is based on an information bottleneck. We show that near-minimax regret forces the learner to acquire \Omega(K) bits of information about the hidden set of good arms under a suitable hard prior, whereas an algorithm with B batches and W bits of memory allows only O(BW) bits of information. A key ingredient is a localized change-of-measure lemma that yields probability-level arm exploration guarantees, which is of independent interest. We also give an algorithm using O(\log T) bits of memory and \widetilde{O}(K) batches that achieves regret \widetilde{O}(\sqrt{KT}), which nearly matches our lower bound.