Imbalanced Classification under Capacity Constraints

arXiv stat.ML / 5/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles imbalanced classification where the minority (positive) class is underrepresented and confirming a potential positive is costly under limited operational capacity.
  • It introduces a framework for sequential/online decision-making that enforces a user-defined bound on the proportion (rate) of observations labeled as positive while maximizing detection performance.
  • The method can be implemented with standard learning techniques and extends naturally to real-time settings where predictions are made as data arrives.
  • Experiments indicate that explicitly modeling capacity constraints yields substantial gains over classical baselines, including resampling approaches like SMOTE that do not directly control the positive selection rate.

Abstract

In many classification settings, the class of primary interest is underrepresented, leading to imbalanced data problems that arise in applications such as rare disease detection and fraud identification. In these contexts, identifying a potential positive instance typically triggers costly follow-up actions, such as medical imaging or detailed transaction inspection, which are subject to limited operational capacity. Motivated by this setting, we consider classification problems where data may arrive sequentially and decisions must be made under constraints on the number of instances that can be selected for further analysis. We propose a classification framework that explicitly controls the rate of positive predictions, enforcing a user-defined bound on the proportion of observations classified as belonging to the minority class while maximizing detection performance. The approach can be implemented using standard learning methods and naturally extends to online settings, where decisions are taken in real time. We show that incorporating capacity constraints leads to substantial improvements over classical approaches, including resampling techniques such as SMOTE, which do not directly control the selection rate.