CURA: Clinical Uncertainty Risk Alignment for Language Model-Based Risk Prediction

arXiv cs.CL / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces CURA, a framework to make uncertainty estimates from clinical language-model risk predictors more reliable and clinically calibrated.
CURA fine-tunes domain-specific clinical LMs to produce patient embeddings, then performs uncertainty-focused fine-tuning of a multi-head classifier using a bi-level objective.
It calibrates uncertainty at the individual level by aligning predicted risk uncertainty with each patient’s likelihood of error, and at the cohort level by regularizing toward event rates in embedding-space neighborhoods.
Experiments on MIMIC-IV across multiple clinical LMs show CURA improves calibration metrics while largely preserving discrimination performance.
The method reduces overconfident false reassurance and produces more trustworthy uncertainty outputs for clinical decision support use cases.

Abstract

Clinical language models (LMs) are increasingly applied to support clinical risk prediction from free-text notes, yet their uncertainty estimates often remain poorly calibrated and clinically unreliable. In this work, we propose Clinical Uncertainty Risk Alignment (CURA), a framework that aligns clinical LM-based risk estimates and uncertainty with both individual error likelihoods and cohort-level ambiguities. CURA first fine-tunes domain-specific clinical LMs to obtain task-adapted patient embeddings, and then performs uncertainty fine-tuning of a multi-head classifier using a bi-level uncertainty objective. Specifically, an individual-level calibration term aligns predictive uncertainty with each patient's likelihood of error, while a cohort-aware regularizer pulls risk estimates toward event rates in their local neighborhoods in the embedding space and places extra weight on ambiguous cohorts near the decision boundary. We further show that this cohort-aware term can be interpreted as a cross-entropy loss with neighborhood-informed soft labels, providing a label-smoothing view of our method. Extensive experiments on MIMIC-IV clinical risk prediction tasks across various clinical LMs show that CURA consistently improves calibration metrics without substantially compromising discrimination. Further analysis illustrates that CURA reduces overconfident false reassurance and yields more trustworthy uncertainty estimates for downstream clinical decision support.