Partial Feedback Online Learning

arXiv stat.ML / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a new learning protocol called partial-feedback online learning, where each input has multiple acceptable labels but the learner receives only one acceptable label per round.
  • It shows that classical version-space methods do not directly apply and introduces a collection version space to enable learnability analysis in this partial-feedback setting.
  • The authors derive tight learnability and minimax-regret characterizations in the set-realizable regime using two new complexity measures: the Partial-Feedback Littlestone dimension (PFLdim) and the Partial-Feedback Measure Shattering dimension (PMSdim).
  • They identify a nested inclusion condition that makes deterministic and randomized learnability coincide, resolving an open question from Raman et al. (2024b).
  • Beyond set realizability, the paper demonstrates a strong limitation: minimax regret can be linear even when the hypothesis space has size 2, indicating a fundamental barrier to extending the favorable theory.

Abstract

We study a new learning protocol, termed partial-feedback online learning, where each instance admits a set of acceptable labels, but the learner observes only one acceptable label per round. We highlight that, while classical version space is widely used for online learnability, it does not directly extend to this setting. We address this obstacle by introducing a collection version space, which maintains sets of hypotheses rather than individual hypotheses. Using this tool, we obtain a tight characterization of learnability in the set-realizable regime. In particular, we define the Partial-Feedback Littlestone dimension (PFLdim) and the Partial-Feedback Measure Shattering dimension (PMSdim), and show that they tightly characterize the minimax regret for deterministic and randomized learners, respectively. We further identify a nested inclusion condition under which deterministic and randomized learnability coincide, resolving an open question of Raman et al. (2024b). Finally, given a hypothesis space H, we show that beyond set realizability, the minimax regret can be linear even when |H|=2, highlighting a barrier beyond set realizability.