The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification

arXiv stat.ML / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies sequential (round-by-round) decision-making for safe online binary classification when the underlying risk follows an unknown logistic model.
  • At each round, the decision maker can either buy a test to reveal the true label or make a prediction using patient features and prior data, aiming to keep the misclassification rate under a target α with confidence at least 1−δ.
  • The authors propose a method that jointly estimates the logistic parameter and the feature distribution, using a conservative threshold on the logistic score to decide when additional testing is necessary.
  • They provide high-probability guarantees that the procedure meets the error constraint, and show a near-oracle “no-regret” testing cost of only ~O(√T) extra tests versus a fully informed benchmark.
  • The work includes simulations demonstrating both safe patient classification and efficient parameter estimation, with direct relevance to medical screening workflows.

Abstract

We study sequential testing for a binary disease outcome when risk follows an unknown logistic model. At each round, the decision maker may either pay for a test revealing the true label or predict the outcome based on patient features and past data. The goal is to minimize costly tests while ensuring the misclassification rate stays below \alpha with probability at least 1-\delta. We propose a method that jointly estimates the logistic parameter \theta^{\star} and the feature distribution, using a conservative threshold on the logistic score to decide when to test. We prove our procedure achieves the target error with high probability and requires only \widetilde O(\sqrt{T}) more tests than an oracle with full knowledge. This is the first no-regret guarantee for error-constrained logistic testing, with direct applications to medical screening. Simulations corroborate our theoretical results, showing safe classification of patients and efficient estimation of \theta^{\star} with few excess tests.