Enhancing the Reliability of Medical AI through Expert-guided Uncertainty Modeling

arXiv cs.LG / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a core reliability challenge for medical AI: AI mistakes are unpredictable, making uncertainty estimation important for risk-aware “second opinion” systems.
  • It proposes using disagreement among human experts as training targets to better quantify aleatoric uncertainty (ambiguity/noise in data), which existing methods struggle to separate reliably.
  • The method estimates two uncertainty components using the law of total variance with a two-ensemble setup, plus a lighter variant for efficiency.
  • Experiments across image classification, segmentation, and multiple-choice QA show expert-guided training improves uncertainty estimation quality by about 9% to 50% depending on the task.
  • The authors argue that incorporating expert knowledge can make medical AI systems more trustworthy by enabling clinicians to focus verification on higher-risk cases.

Abstract

Artificial intelligence (AI) systems accelerate medical workflows and improve diagnostic accuracy in healthcare, serving as second-opinion systems. However, the unpredictability of AI errors poses a significant challenge, particularly in healthcare contexts, where mistakes can have severe consequences. A widely adopted safeguard is to pair predictions with uncertainty estimation, enabling human experts to focus on high-risk cases while streamlining routine verification. Current uncertainty estimation methods, however, remain limited, particularly in quantifying aleatoric uncertainty, which arises from data ambiguity and noise. To address this, we propose a novel approach that leverages disagreement in expert responses to generate targets for training machine learning models. These targets are used in conjunction with standard data labels to estimate two components of uncertainty separately, as given by the law of total variance, via a two-ensemble approach, as well as its lightweight variant. We validate our method on binary image classification, binary and multi-class image segmentation, and multiple-choice question answering. Our experiments demonstrate that incorporating expert knowledge can enhance uncertainty estimation quality by 9\% to 50\% depending on the task, making this source of information invaluable for the construction of risk-aware AI systems in healthcare applications.