A deep learning pipeline for PAM50 subtype classification using histopathology images and multi-objective patch selection

arXiv cs.CV / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents an optimization-driven deep learning framework to predict PAM50 intrinsic breast cancer subtypes directly from H&E whole-slide images, aiming to reduce dependence on costly molecular assays.
  • It jointly optimizes which histopathology patches to use by balancing informativeness, spatial diversity, uncertainty, and patch count, using NSGA-II for multi-objective selection combined with Monte Carlo dropout for uncertainty estimation.
  • A ResNet18 backbone with a custom CNN classification head is used, with the method designed to identify a small but highly informative subset of patches rather than relying on exhaustive sampling.
  • Experiments train on the internal TCGA-BRCA dataset (627 WSIs) and validate on the external CPTAC-BRCA dataset, achieving F1/AUC of 0.8812/0.9841 internally and 0.7952/0.9512 externally.
  • The authors argue the approach improves computational efficiency and supports the feasibility of scalable imaging-based PAM50 classification for potential clinical decision-making.

Abstract

Breast cancer is a highly heterogeneous disease with diverse molecular profiles. The PAM50 gene signature is widely recognized as a standard for classifying breast cancer into intrinsic subtypes, enabling more personalized treatment strategies. In this study, we introduce a novel optimization-driven deep learning framework that aims to reduce reliance on costly molecular assays by directly predicting PAM50 subtypes from H&E-stained whole-slide images (WSIs). Our method jointly optimizes patch informativeness, spatial diversity, uncertainty, and patch count by combining the non-dominated sorting genetic algorithm II (NSGA-II) with Monte Carlo dropout-based uncertainty estimation. The proposed method can identify a small but highly informative patch subset for classification. We used a ResNet18 backbone for feature extraction and a custom CNN head for classification. For evaluation, we used the internal TCGA-BRCA dataset as the training cohort and the external CPTAC-BRCA dataset as the test cohort. On the internal dataset, an F1-score of 0.8812 and an AUC of 0.9841 using 627 WSIs from the TCGA-BRCA cohort were achieved. The performance of the proposed approach on the external validation dataset showed an F1-score of 0.7952 and an AUC of 0.9512. These findings indicate that the proposed optimization-guided, uncertainty-aware patch selection can achieve high performance and improve the computational efficiency of histopathology-based PAM50 classification compared to existing methods, suggesting a scalable imaging-based replacement that has the potential to support clinical decision-making.