Ensemble of Small Classifiers For Imbalanced White Blood Cell Classification

arXiv cs.CV / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses automated white blood cell classification for leukemia diagnosis, emphasizing the difficulty of building robust models under class imbalance and inter-patient variability in staining and scanning conditions.
  • It proposes a lightweight ensemble method for classifying cells across Granulopoiesis, Monocytopoiesis, and Lymphopoiesis using logit averaging across multiple pretrained CNN/ViT-style architectures.
  • To mitigate rare-cell imbalance, the authors expand the dataset and evaluate the approach using stratified 3-fold cross-validation with 3 instantiations per architecture (9 models total).
  • The reported results show strong performance on a challenging WBC dataset, while the authors analyze failure modes such as confusion between similar-looking myelocytes and lymphocytes.
  • The work provides code publicly for replication and further experimentation via the linked GitLab repository.

Abstract

Automating white blood cell classification for diagnosis of leukaemia is a promising alternative to time-consuming and resource-intensive examination of cells by expert pathologists. However, designing robust algorithms for classification of rare cell types remains challenging due to variations in staining, scanning and inter-patient heterogeneity. We propose a lightweight ensemble approach for classification of cells during Haematopoiesis, with a focus on the biology of Granulopoiesis, Monocytopoiesis and Lymphopoiesis. Through dataset expansion to alleviate some class imbalance, we demonstrate that a simple ensemble of lightweight pretrained SwinV2-Tiny, DinoBloom-Small and ConvNeXT-V2-Tiny models achieves excellent performance on this challenging dataset. We train 3 instantiations of each architecture in a stratified 3-fold cross-validation framework; for an input image, we forward-pass through all 9 models and aggregate through logit averaging. We further reason on the weaknesses of our model in confusing similar-looking myelocytes in granulopoiesis and lymphocytes in lymphopoiesis. Code: https://gitlab.com/siddharthsrivastava/wbc-bench-2026.