Improving Generalization of Deep Learning for Brain Metastases Segmentation Across Institutions

arXiv cs.AI / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study addresses a key limitation of deep learning for brain metastases (BM) segmentation: models trained on a single institution can underperform elsewhere due to differences in scanners, imaging protocols, and demographics.
  • It proposes a domain-adaptation pipeline (VAE-MMD preprocessing) that aligns features across institutions by combining a variational autoencoder with maximum mean discrepancy (MMD) loss, and then running nnU-Net for segmentation with architectural enhancements like skip connections and self-attention.
  • Evaluated on 740 patients from four public datasets (Stanford, UCSF, UCLM, PKG), the approach substantially improves cross-site performance without requiring target-domain labels.
  • The results show stronger segmentation quality across volumetric, detection, and boundary metrics, including a reported ~11.1% increase in mean F1, ~7.93% increase in mean surface Dice, and ~65.5% reduction in mean HD95 versus baseline nnU-Net.
  • Feature alignment is supported by a major drop in domain classifier accuracy (0.91 to 0.50), suggesting the method successfully reduces cross-institution heterogeneity in learned representations.

Abstract

Background: Deep learning has demonstrated significant potential for automated brain metastases (BM) segmentation; however, models trained at a singular institution often exhibit suboptimal performance at various sites due to disparities in scanner hardware, imaging protocols, and patient demographics. The goal of this work is to create a domain adaptation framework that will allow for BM segmentation to be used across multiple institutions. Methods: We propose a VAE-MMD preprocessing pipeline that combines variational autoencoders (VAE) with maximum mean discrepancy (MMD) loss, incorporating skip connections and self-attention mechanisms alongside nnU-Net segmentation. The method was tested on 740 patients from four public databases: Stanford, UCSF, UCLM, and PKG, evaluated by domain classifier's accuracy, sensitivity, precision, F1/F2 scores, surface Dice (sDice), and 95th percentile Hausdorff distance (HD95). Results: VAE-MMD reduced domain classifier accuracy from 0.91 to 0.50, indicating successful feature alignment across institutions. Reconstructed volumes attained a PSNR greater than 36 dB, maintaining anatomical accuracy. The combined method raised the mean F1 by 11.1% (0.700 to 0.778), the mean sDice by 7.93% (0.7121 to 0.7686), and reduced the mean HD95 by 65.5% (11.33 to 3.91 mm) across all four centers compared to the baseline nnU-Net. Conclusions: VAE-MMD effectively diminishes cross-institutional data heterogeneity and enhances BM segmentation generalization across volumetric, detection, and boundary-level metrics without necessitating target-domain labels, thereby overcoming a significant obstacle to the clinical implementation of AI-assisted segmentation.