AI Navigate

Rethinking Uncertainty Quantification and Entanglement in Image Segmentation

arXiv cs.CV / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Uncertainty quantification in image segmentation is decomposed into data-related aleatoric uncertainty (AU) and model-related epistemic uncertainty (EU), but the interaction between AU and EU remains unclear.
  • The authors present a comprehensive empirical study of AU–EU model combinations, introduce a metric to quantify uncertainty entanglement, and evaluate its impact on downstream UQ tasks.
  • For out-of-distribution detection, ensembles exhibit consistently lower entanglement and better performance, while the best models for ambiguity modeling and calibration are dataset-dependent.
  • Softmax/SSN-based methods perform well for ambiguity modeling, Probabilistic UNets are more entangled, and a softmax ensemble delivers strong results across tasks.
  • The work analyzes potential sources of uncertainty entanglement and outlines directions for mitigating this effect to improve interpretability and practical usefulness.

Abstract

Uncertainty quantification (UQ) is crucial in safety-critical applications such as medical image segmentation. Total uncertainty is typically decomposed into data-related aleatoric uncertainty (AU) and model-related epistemic uncertainty (EU). Many methods exist for modeling AU (such as Probabilistic UNet, Diffusion) and EU (such as ensembles, MC Dropout), but it is unclear how they interact when combined. Additionally, recent work has revealed substantial entanglement between AU and EU, undermining the interpretability and practical usefulness of the decomposition. We present a comprehensive empirical study covering a broad range of AU-EU model combinations, propose a metric to quantify uncertainty entanglement, and evaluate both across downstream UQ tasks. For out-of-distribution detection, ensembles exhibit consistently lower entanglement and superior performance. For ambiguity modeling and calibration the best models are dataset-dependent, with softmax/SSN-based methods performing well and Probabilistic UNets being less entangled. A softmax ensemble fares remarkably well on all tasks. Finally, we analyze potential sources of uncertainty entanglement and outline directions for mitigating this effect.