CLEAR: Revealing How Noise and Ambiguity Degrade Reliability in LLMs for Medicine
arXiv cs.CL / 5/5/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The CLEAR framework is proposed to measure how ambiguity and uncertainty in decision-space design affect medical LLM reliability, going beyond simplified exam-style benchmarks.
- The evaluation systematically perturbs the number of plausible answer options, whether ground-truth/abstention options exist, and how answer options are semantically framed.
- Applying CLEAR across three medical benchmarks and 17 LLMs shows that more plausible options reduce both correct-answer selection and safe abstention from wrong answers.
- Reliability drops further when abstention is framed as uncertainty (“I don’t know”) rather than assertive rejection (“None of the Above”), and simply adding an “I don’t know” option can increase incorrect selections.
- The paper introduces a “humility deficit” that quantifies the gap between choosing correct answers and abstaining from incorrect ones, and finds this gap worsens as model size increases.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF
Dev.to

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF
Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026
Dev.to

When a Bottling Line Stops at 2 A.M., the Agent That Wins Is the One That Finds the Right Replacement Part
Dev.to

My ‘Busy’ Button Is a Chat Window: 8 Hours of Sorting & Broccoli Poetry
Dev.to