Automated Prostate Gland Segmentation in MRI Using nnU-Net

arXiv cs.CV / 4/3/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The study introduces a dedicated deep learning method for automated prostate gland segmentation in multiparametric MRI using the nnU-Net v2 framework rather than relying on general-purpose segmentation tools.
  • By leveraging multimodal inputs (T2-weighted, DWI, and ADC maps) and training on 981 PI-CAI cases, the model achieves strong in-domain performance with a high mean Dice score during cross-validation.
  • External validation on 54 patients from Hospital La Fe shows maintained generalization under domain shift, though with lower Dice on the external test set (0.82), reflecting real-world variability.
  • In head-to-head comparison, TotalSegmentator performs far worse (Dice 0.15), largely due to under-segmentation, underscoring the value of prostate-specific task design.
  • For reproducibility and easier adoption, the model is fully containerized and provided as a ready-to-use inference tool for clinical research workflows.

Abstract

Accurate segmentation of the prostate gland in multiparametric MRI (mpMRI) is a fundamental step for a wide range of clinical and research applications, including image registration, volume estimation, and radiomic analysis. However, manual delineation is time-consuming and subject to inter-observer variability, while general-purpose segmentation tools often fail to provide sufficient accuracy for prostate-specific tasks. In this work, we propose a dedicated deep learning-based approach for automatic prostate gland segmentation using the nnU-Net v2 framework. The model leverages multimodal mpMRI data, including T2-weighted imaging, diffusion-weighted imaging (DWI), and apparent diffusion coefficient (ADC) maps, to exploit complementary tissue information. Training was performed on 981 cases from the PI-CAI dataset using whole-gland annotations, and model performance was assessed through 5-fold cross-validation and external validation on an independent cohort of 54 patients from Hospital La Fe. The proposed model achieved a mean Dice score of 0.96 +/- 0.00 in cross-validation and 0.82 on the external test set, demonstrating strong generalization despite domain shift. In comparison, a general-purpose approach (TotalSegmentator) showed substantially lower performance, with a Dice score of 0.15, primarily due to under-segmentation of the gland. These results highlight the importance of task-specific, multimodal segmentation strategies and demonstrate the potential of the proposed approach for reliable integration into clinical research workflows. To facilitate reproducibility and deployment, the model has been fully containerized and is available as a ready-to-use inference tool.