Correcting Performance Estimation Bias in Imbalanced Classification with Minority Subconcepts

arXiv cs.AI / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that class-level evaluation in imbalanced classification can hide large performance gaps across different subconcepts within the same class.
It notes prior mitigation methods rely on true subconcept labels at test time, which are often unavailable in real settings.
To address this, the authors propose a utility-weighted evaluation that substitutes missing subconcept labels with posterior probabilities from a multiclass subconcept model.
They define the resulting soft, uncertainty-aware metric called predicted-weighted balanced accuracy (pBA), which aims to produce more stable and interpretable assessments.
Experiments across tabular, medical-imaging, and text benchmarks show that unweighted metrics can be misleading under within-class heterogeneity, while pBA better reflects subgroup performance when the subconcept distributions are uneven but not extreme.

Abstract

Class-level evaluation can conceal substantial performance disparities across subconcepts within the same class, causing models that perform well on average to fail on specific subpopulations. Prior work has shown that common evaluation measures for imbalanced classification are biased toward larger minority subconcepts and that utility-based reweighting using true subconcept labels can mitigate this bias; however, such labels are rarely available at test time. We introduce a practical utility-weighted evaluation that replaces unavailable subconcept labels with predicted posterior probabilities from a multiclass subconcept model. Evaluation weights are defined as the expected utility under this posterior, yielding a soft, uncertainty-aware metric we call predicted-weighted balanced accuracy (pBA). Experiments on tabular benchmarks as well as medical-imaging and text datasets show that unweighted scores can be misleading under within-class heterogeneity, while pBA provides more stable and interpretable assessments when subconcept distributions are uneven but not pathological. Our code is available at: https://anonymous.4open.science/r/correcting-bias-imbalance-9C6C/.

Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]

Reddit r/MachineLearning

Agent Amnesia and the Case of Henry Molaison

Dev.to

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry

Dev.to

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance

Dev.to

Vibe coding is a tool, not a shortcut. Most people are using it wrong.

Dev.to

Correcting Performance Estimation Bias in Imbalanced Classification with Minority Subconcepts

Key Points

Abstract

Related Articles

Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]

Agent Amnesia and the Case of Henry Molaison

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance

Vibe coding is a tool, not a shortcut. Most people are using it wrong.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer