Minimizing Human Intervention in Online Classification

arXiv stat.ML / 5/4/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies how to reduce costly human expert calls in LLM-based online classification using an active learning setup with an expert-vs-guessing agent and an oracle benchmark.
  • It shows that when the time horizon T is sufficiently large (at least exponential in the embedding dimension d), the class-region geometry can be learned and a Conservative Hull-based Classifier (CHC) can use convex hulls to decide when to ask for expert labels.
  • CHC achieves O(log^d T) regret and is minimax optimal for the 1D case (d=1), while it argues that reliable geometry learning is not possible in general for other settings.
  • For a more limited regime (subgaussian mixture queries and T ≤ e^d), the authors propose a Center-based Classifier (CC) with regret scaling tied to the number of labels N.
  • To work across regimes, they introduce the Generalized Hull-based Classifier (GHC), a practical CHC extension that uses a tunable parameter to allow more aggressive guessing, and validate it on real-world QA datasets with strong text embedding models.

Abstract

Training or fine-tuning large language model (LLM)-based systems often requires costly human feedback, yet there is limited understanding of how to minimize such intervention while maintaining strong error guarantees. We study this problem for LLM-based classification systems in an active learning framework: an agent sequentially labels d-dimensional query embeddings drawn i.i.d. from an unknown distribution by either calling a costly expert or guessing with no feedback, with the goal of minimizing regret relative to an oracle with free expert access. When the horizon T is at least exponential in the embedding dimension d, the geometry of the class regions can be learned. In this regime, we propose the Conservative Hull-based Classifier (CHC), which maintains convex hulls of expert-labeled queries and calls the expert when a query lands outside all known hulls. CHC attains \mathcal{O}(\log^d T) regret in T and is minimax optimal for d=1. Otherwise, the geometry cannot be reliably learned in general. We show that for queries drawn from a subgaussian mixture and T \le e^d, a Center-based Classifier (CC) achieves regret proportional to N\log{N} where N is the number of labels. To bridge these regimes, we introduce the Generalized Hull-based Classifier (GHC), a practical extension of CHC that enables more aggressive guessing via a tunable parameter. Our approach is validated on real-world question-answering datasets using state-of-the-art text embedding models.