Multi-Stage Fine-Tuning of Pathology Foundation Models with Head-Diverse Ensembling for White Blood Cell Classification

arXiv cs.CV / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a multi-stage fine-tuning approach for 13-class white blood cell (WBC) classification to address class imbalance, domain shift, and morphological continuum confusion seen in leukemia diagnosis tasks.
  • It fine-tunes a DINOBloom-base model and uses multiple classifier head families (linear, cosine, and MLP), finding that different heads specialize by maturation stage (cosine for mature granulocytes, linear for more immature, and MLP for the most immature).
  • Leveraging this specialization, the authors build a head-diverse ensemble that mostly relies on the MLP head but conditionally replaces predictions in predefined confusion pairs when other heads agree.
  • The study also reports that samples misclassified consistently across all models are enriched for probable label errors or inherently ambiguous morphology, suggesting limits of model separability and potential data-quality value.
  • The work is positioned for evaluation on the WBCBench 2026 Challenge (ISBI 2026), indicating a target benchmark-driven validation of the methodology.

Abstract

The classification of white blood cells (WBCs) from peripheral blood smears is critical for the diagnosis of leukemia. However, automated approaches still struggle due to challenges including class imbalance, domain shift, and morphological continuum confusion, where adjacent maturation stages exhibit subtle, overlapping features. We present a multi-stage fine-tuning methodology for 13-class WBC classification in the WBCBench 2026 Challenge (ISBI 2026). Our best-performing model is a fine-tuned DINOBloom-base, on which we train multiple classifier head families (linear, cosine, and multilayer perceptron (MLP)). The cosine head performed best on the mature granulocyte boundary (Band neutrophil (BNE) F1 = 0.470), the linear head on more immature granulocyte classes (Metamyelocyte (MMY) F1 = 0.585), and the MLP head on the most immature granulocyte (Promyelocyte (PMY) F1 = 0.733), revealing class-specific specialization. Based on this specialization, we construct a head-diverse ensemble, where the MLP head acts as the primary predictor, and its predictions within the four predefined confusion pairs are replaced only when two other head families agree. We further show that cases consistently misclassified by all models are substantially enriched for probable labeling errors or inherent morphological ambiguity.