Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory

arXiv cs.CL / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that common LLM confidence calibration metrics (e.g., ECE, Brier score) mix two abilities—Type-1 sensitivity (how much the model knows) and Type-2 metacognitive sensitivity (how well it knows what it knows).
It proposes an evaluation framework using Type-2 Signal Detection Theory, introducing meta-d' and an M-ratio to separately measure metacognitive capacity and metacognitive efficiency.
Experiments on four LLMs across 224,000 factual QA trials show large differences in metacognitive efficiency even when Type-1 sensitivity is similar, including cases where a model ranks highest by d' but lowest by M-ratio.
The study finds metacognitive efficiency is domain-specific and can be shifted by temperature changes, indicating that confidence policy (Type-2 criterion) can move independently of underlying metacognitive capacity for some models.
It reports that AUROC_2 and M-ratio can produce fully inverted model rankings, suggesting these metrics answer fundamentally different evaluation questions, with implications for model selection and deployment.

Continue reading this article on the original site.

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

Dev.to

Dev.to

Dev.to

Reddit r/artificial

Dev.to