Beyond "I Don't Know": Evaluating LLM Self-Awareness in Discriminating Data and Model Uncertainty
arXiv cs.CL / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that LLMs should be able to abstain when confidence is low, but existing work often treats refusals as a generic “I don’t know,” without distinguishing whether uncertainty comes from ambiguous input data or from limitations of the model itself.
- It introduces UA-Bench, a benchmark with 3,500+ questions across six datasets, specifically intended to test whether LLMs can explicitly attribute uncertainty to data uncertainty versus model uncertainty.
- An evaluation of 18 frontier LLMs finds that even top models struggle to reliably make this distinction, and that high answer accuracy does not necessarily correlate with strong uncertainty attribution.
- To address the gap, the authors propose a lightweight data synthesis plus reinforcement learning approach, reporting improvements in uncertainty attribution while maintaining answer accuracy on Qwen3-4B-Instruct-2507 and Qwen3-8B (thinking mode).
- The authors state that their code and data are publicly available.
Related Articles

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA