Domain-Guided YOLO26 with Composite BCE-Dice-Lov\'{a}sz Loss for Multi-Class Fetal Head Ultrasound Segmentation

arXiv cs.CV / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses fetal head ultrasound segmentation by proposing a prompt-free pipeline that jointly detects and segments Brain, CSP, and Lateral Ventricles in a single YOLO26-Seg forward pass.
  • It introduces a composite BCE-Dice-Lovász loss with inverse-frequency class weighting, integrated into YOLO26 training via runtime monkey-patching to better handle class imbalance.
  • The approach improves minority-class learning using domain-guided copy-paste augmentation that preserves anatomical context relative to the brain boundary.
  • It reports strong performance on 575 held-out test images, achieving a mean Dice of 0.9253 versus a baseline of 0.9012 (a +2.68 percentage point gain) and includes ablations analyzing contributions and sensitivity to annotation quality and imbalance.

Abstract

Segmenting fetal head structures from prenatal ultrasound remains a practical bottleneck in obstetric imaging. The current state-of-the-art baseline, proposed alongside the published dataset, adapts the Segment Anything Model with per-class Dice and Lov\'{a}sz losses but still depends on bounding-box prompts at test time. We build a prompt-free pipeline on top of YOLO26-Seg that jointly detects and segments three structures, Brain, Cavum Septi Pellucidi (CSP), and Lateral Ventricles (LV), in a single forward pass. Three modifications are central to our approach: (i) a composite BCE-Dice-Lov\'{a}sz segmentation loss with inverse-frequency class weighting, injected into the YOLO26 training loop via runtime monkey-patching; (ii) domain-guided copy-paste augmentation that transplants minority-class structures while respecting their anatomical location relative to the brain boundary; and (iii) inter-patient stratified splitting to prevent data leakage. On 575 held-out test images, the composite loss variant reaches a mean Dice coefficient of 0.9253, exceeding the baseline (0.9012) by 2.68 percentage points, despite reporting over three foreground classes only, whereas the baseline's reported mean includes the easy background class. We further ablate each component and discuss annotation-quality and class-imbalance effects on CSP and LV performance.