Minimax Generalized Cross-Entropy

arXiv stat.ML / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a minimax formulation of generalized cross-entropy (MGCE) that makes optimization convex over classification margins, addressing non-convexity in prior GCE methods.
  • MGCE provides an upper bound on classification error and is optimized via a bilevel convex optimization framework that can be implemented efficiently with implicit differentiation.
  • Experiments on benchmark datasets show MGCE achieves stronger accuracy, faster convergence, and better calibration, especially in the presence of label noise.
  • The work positions MGCE as a robust alternative for training classifiers, with potential to influence practical model-training workflows.

Abstract

Loss functions play a central role in supervised classification. Cross-entropy (CE) is widely used, whereas the mean absolute error (MAE) loss can offer robustness but is difficult to optimize. Interpolating between the CE and MAE losses, generalized cross-entropy (GCE) has recently been introduced to provide a trade-off between optimization difficulty and robustness. Existing formulations of GCE result in a non-convex optimization over classification margins that is prone to underfitting, leading to poor performances with complex datasets. In this paper, we propose a minimax formulation of generalized cross-entropy (MGCE) that results in a convex optimization over classification margins. Moreover, we show that MGCEs can provide an upper bound on the classification error. The proposed bilevel convex optimization can be efficiently implemented using stochastic gradient computed via implicit differentiation. Using benchmark datasets, we show that MGCE achieves strong accuracy, faster convergence, and better calibration, especially in the presence of label noise.

Minimax Generalized Cross-Entropy | AI Navigate