Needle in a Haystack -- One-Class Representation Learning for Detecting Rare Malignant Cells in Computational Cytology

arXiv cs.CV / 4/10/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses malignant-cell detection in computational cytology as a severe class-imbalance problem where malignant cells are morphologically diverse yet extremely rare on whole-slide images.
  • It proposes one-class representation learning for low “witness rate” settings, training solely on slide-negative patches without instance-level supervision and flagging deviations at test time.
  • The authors evaluate two one-class methods—DSVDD and DROC—and compare them against multiple weakly supervised baselines (FS-SIL, WS-SIL) and ItS2CLR.
  • Experiments on TCIA bone marrow and an in-house oral cancer cytology dataset show DSVDD achieves state-of-the-art instance-level abnormality ranking in ultra-low witness-rate regimes (≤1%), sometimes outperforming fully supervised learning.
  • The work argues that one-class representation learning is a more robust and interpretable alternative to multiple instance learning when malignant instances are exceedingly scarce and annotations are impractical.

Abstract

In computational cytology, detecting malignancy on whole-slide images is difficult because malignant cells are morphologically diverse yet vanishingly rare amid a vast background of normal cells. Accurate detection of these extremely rare malignant cells remains challenging due to large class imbalance and limited annotations. Conventional weakly supervised approaches, such as multiple instance learning (MIL), often fail to generalize at the instance level, especially when the fraction of malignant cells (witness rate) is exceedingly low. In this study, we explore the use of one-class representation learning techniques for detecting malignant cells in low-witness-rate scenarios. These methods are trained exclusively on slide-negative patches, without requiring any instance-level supervision. Specifically, we evaluate two OCC approaches, DSVDD and DROC, and compare them with FS-SIL, WS-SIL, and the recent ItS2CLR method. The one-class methods learn compact representations of normality and detect deviations at test time. Experiments on a publicly available bone marrow cytomorphology dataset (TCIA) and an in-house oral cancer cytology dataset show that DSVDD achieves state-of-the-art performance in instance-level abnormality ranking, particularly in ultra-low witness-rate regimes (\leq 1\%) and, in some cases, even outperforming fully supervised learning, which is typically not a practical option in whole-slide cytology due to the infeasibility of exhaustive instance-level annotations. DROC is also competitive under extreme rarity, benefiting from distribution-augmented contrastive learning. These findings highlight one-class representation learning as a robust and interpretable superior choice to MIL for malignant cell detection under extreme rarity.