From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs
arXiv cs.AI / 4/8/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper tackles hallucination safety in medical LLMs by focusing on diabetic retinopathy decision settings where evidence can be insufficient or conflicting.
- It introduces RETINA-SAFE, a retinal-evidence benchmark of 12,522 samples organized into three evidence-relation tasks: E-Align, E-Conflict, and E-Gap.
- The authors propose ECRT (Evidence-Conditioned Risk Triage), a two-stage white-box framework that first triages cases as Safe vs Unsafe and then attributes unsafe cases to contradiction-driven vs evidence-gap risk types.
- ECRT uses internal representations and logit shifts under CTX/NOCTX conditions with class-balanced training, and evaluates robustness across multiple model backbones using evidence-grouped (not patient-disjoint) splits.
- Results show improved Stage-1 balanced accuracy (+0.15 to +0.19 over external uncertainty and self-consistency baselines, +0.02 to +0.07 over the strongest adapted supervised baseline), indicating interpretable, evidence-grounded risk triage as a practical direction.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.




