LLMs as Signal Detectors: Sensitivity, Bias, and the Temperature-Criterion Analogy
arXiv cs.CL / 3/17/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that calibration metrics for LLMs conflate sensitivity and bias, and proposes using Signal Detection Theory (SDT) to separate these components for more precise evaluation.
- It employs a full parametric SDT framework (unequal-variance modeling, criterion estimation, and z-ROC analysis) across 168,000 trials and three LLMs.
- It investigates whether temperature functions as a criterion shift (as with payoff manipulations in human psychophysics) and finds that this analogy can break down because temperature also changes the generated output.
- The results show unequal-variance evidence distributions among models, with instruct models exhibiting more pronounced asymmetry in z-ROC slopes, and demonstrate that calibration metrics alone cannot distinguish models in sensitivity vs. bias, highlighting the value of the full SDT framework.
Related Articles

Astral to Join OpenAI
Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA

Why Data is Important for LLM
Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.
Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever
Dev.to