Clinical DVH metrics as a loss function for 3D dose prediction in head and neck radiotherapy

arXiv cs.CV / 4/1/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that voxel-wise regression losses used for 3D radiotherapy dose prediction do not align well with clinical plan evaluation, which relies on DVH (dose-volume histogram) metrics.
  • It proposes a clinically guided loss function (CDM loss) that directly optimizes differentiable D-metrics and surrogate V-metrics, using bit-mask ROI encoding to improve training efficiency.
  • On 174 head-and-neck radiotherapy patients, CDM loss outperformed MAE- and DVH-curve-based training objectives by improving target coverage while keeping OAR (organ-at-risk) sparing comparable.
  • The authors report that adding CDM loss reduced PTV Score from 1.544 (MAE) to 0.491 (MAE + CDM), and that bit-mask ROI encoding cut training time by 83% and reduced GPU memory usage.
  • The work concludes that directly optimizing clinically used DVH metrics with efficient ROI handling yields dose predictions better matched to real treatment planning criteria and is scalable for practical use.

Abstract

Purpose: Deep-learning-based three-dimensional (3D) dose prediction is widely used in automated radiotherapy workflows. However, most existing models are trained with voxel-wise regression losses, which are poorly aligned with clinical plan evaluation criteria based on dose-volume histogram (DVH) metrics. This study aims to develop a clinically guided loss formulation that directly optimizes clinically used DVH metrics while remaining computationally efficient for head and neck (H\&N) dose prediction. Methods: We propose a clinical DVH metric loss (CDM loss) that incorporates differentiable \textit{D-metrics} and surrogate \textit{V-metrics}, together with a lossless bit-mask region-of-interest (ROI) encoding to improve training efficiency. The method was evaluated on 174 H\&N patients using a temporal split (137 training, 37 testing). Results: Compared with MAE- and DVH-curve based losses, CDM loss substantially improved target coverage and satisfied all clinical constraints. Using a standard 3D U-Net, the PTV Score was reduced from 1.544 (MAE) to 0.491 (MAE + CDM), while OAR sparing remained comparable. Bit-mask encoding reduced training time by 83\% and lowered GPU memory usage. Conclusion: Directly optimizing clinically used DVH metrics enables 3D dose predictions that are better aligned with clinical treatment planning criteria than conventional voxel-wise or DVH-curve-based supervision. The proposed CDM loss, combined with efficient ROI bit-mask encoding, provides a practical and scalable framework for H\&N dose prediction.