DeferredSeg: A Multi-Expert Deferral Framework for Trustworthy Medical Image Segmentation

arXiv cs.CV / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces DeferredSeg, a deferral-aware medical image segmentation framework designed to improve the trustworthiness of segmentation by handling overconfidence/underconfidence in ambiguous regions.
  • DeferredSeg adds an aggregated deferral predictor and routing channels so each pixel can be sent either to a base segmentor or to a human expert, implementing a Human–AI collaboration approach.
  • Training uses a pixel-wise surrogate collaboration loss to supervise deferral decisions efficiently and a spatial-coherence loss to keep deferral regions smooth and spatially consistent.
  • The framework is extended to a multi-expert setting with discrepancy experts and a load-balancing penalty to distribute expert workload without overloading or underutilization.
  • Experiments on three challenging medical datasets using MedSAM and CENet as base segmentors show DeferredSeg outperforms baselines and is model-agnostic for other segmentation architectures.

Abstract

Segmentation models based on deep neural networks demonstrate strong generalization for medical image segmentation. However, they often exhibit overconfidence or underconfidence, leading to unreliable confidence scores for segmentation masks, especially in ambiguous regions. This undermines the trustworthiness required for clinical deployment. Motivated by the learning-to-defer (L2D) paradigm, we introduce DeferredSeg, a deferral-aware segmentation framework, i.e., a Human--AI collaboration system that determines whether to defer predictions to human experts in specific regions. DeferredSeg extends the base segmentor with an aggregated deferral predictor and additional routing channels that dynamically route each pixel to either the base segmentor or a human expert. To train this routing efficiently, we introduce a pixel-wise surrogate collaboration loss that supervises deferral decisions. In addition, to preserve spatial coherence within deferral regions, we propose a spatial-coherence loss that enforces smooth deferral masks, thereby enhancing reliability. Beyond single-expert deferral, we further extend the framework to a multi-expert setting by introducing multiple discrepancy experts for collaborative decision-making. To prevent overloading or underutilizing individual experts, we further design a load-balancing penalty that evenly distributes workload across expert branches. We evaluate DeferredSeg on three challenging medical datasets using MedSAM and CENet as the base segmentor for fair comparison. Experimental results show that DeferredSeg consistently outperforms the baseline, demonstrating its effectiveness for trustworthy dense medical segmentation. Moreover, the proposed framework is model-agnostic and can be readily applied to other segmentation architectures.