CAMO: A Class-Aware Minority-Optimized Ensemble for Robust Language Model Evaluation on Imbalanced Data

arXiv cs.CL / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces CAMO (Class-Aware Minority-Optimized), an ensemble method designed to improve language model evaluation and prediction under severe class imbalance by avoiding majority-class dominance.
  • CAMO uses a hierarchical strategy that combines vote distributions, confidence calibration, and inter-model uncertainty to dynamically boost underrepresented classes and strengthen minority forecasts.
  • Experiments on two unbalanced, domain-specific benchmarks (DIAR-AI/Emotion and ternary BEA 2025) evaluate CAMO against seven established ensemble approaches using eight language models (including both LLMs and SLMs) across zero-shot and fine-tuned settings.
  • Results indicate CAMO achieves the best strict macro F1-score with refined models and that ensemble effectiveness depends on model properties, especially when model adaptation is applied.
  • The authors claim CAMO is a domain-neutral framework for imbalanced categorization and a reliable approach for robust evaluation in real-world, skewed datasets.

Abstract

Real-world categorization is severely hampered by class imbalance because traditional ensembles favor majority classes, which lowers minority performance and overall F1-score. We provide a unique ensemble technique for imbalanced problems called CAMO (Class-Aware Minority-Optimized).Through a hierarchical procedure that incorporates vote distributions, confidence calibration, and inter model uncertainty, CAMO dynamically boosts underrepresented classes while preserving and amplifying minority forecasts.We verify CAMO on two highly unbalanced, domain-specific benchmarks: the DIAR-AI/Emotion dataset and the ternary BEA 2025 dataset. We benchmark against seven proven ensemble algorithms using eight different language models (three LLMs and five SLMs) under zero-shot and fine-tuned settings .With refined models, CAMO consistently earns the greatest strict macro F1-score, setting a new benchmark. Its benefit works in concert with model adaptation, showing that the best ensemble choice depends on model properties .This proves that CAMO is a reliable, domain-neutral framework for unbalanced categorization.