Online Conformal Prediction with Adversarial Semi-bandit Feedback via Regret Minimization

arXiv stat.ML / 4/21/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses online uncertainty quantification in safety-critical systems, focusing on online conformal prediction where data arrive sequentially and prediction sets are updated at each step.
  • It extends online conformal prediction from a standard full-feedback setting to a more difficult partial-feedback scenario where the true label is revealed only if it falls inside the constructed prediction set, modeled as an adaptive adversary.
  • The authors reformulate online conformal prediction as an adversarial bandit problem, treating each candidate prediction set as an “arm” and building on an existing adversarial bandit algorithm.
  • The proposed approach provides a long-run coverage guarantee by explicitly linking performance to regret minimization, and experiments show it controls miscoverage while keeping prediction set sizes reasonable in both i.i.d. and non-i.i.d. conditions.

Abstract

Uncertainty quantification is crucial in safety-critical systems, where decisions must be made under uncertainty. In particular, we consider the problem of online uncertainty quantification, where data points arrive sequentially. Online conformal prediction is a principled online uncertainty quantification method that dynamically constructs a prediction set at each time step. While existing methods for online conformal prediction provide long-run coverage guarantees without any distributional assumptions, they typically assume a full feedback setting in which the true label is always observed. In this paper, we propose a novel learning method for online conformal prediction with partial feedback from an adaptive adversary-a more challenging setup where the true label is revealed only when it lies inside the constructed prediction set. Specifically, we formulate online conformal prediction as an adversarial bandit problem by treating each candidate prediction set as an arm. Building on an existing algorithm for adversarial bandits, our method achieves a long-run coverage guarantee by explicitly establishing its connection to the regret of the learner. Finally, we empirically demonstrate the effectiveness of our method in both independent and identically distributed (i.i.d.) and non-i.i.d. settings, showing that it successfully controls the miscoverage rate while maintaining a reasonable size of the prediction set.