ALIEN: Aligned Entropy Head for Improving Uncertainty Estimation of LLMs

arXiv stat.ML / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper identifies a limitation of predictive entropy for uncertainty estimation in LLM adaptation: it under-captures factors like class overlap and ambiguous cues, leading to overconfidence on difficult inputs.
It proposes ALIEN (Aligned Entropy), a lightweight uncertainty head trained to start from the model’s original entropy and then fine-tuned with regularization that aligns entropy with prediction reliability.
Across seven classification datasets and two NER benchmarks, evaluated on multiple language models (RoBERTa, ELECTRA, LLaMA-2, Qwen2.5, Qwen3), ALIEN improves incorrect-prediction detection and achieves the lowest calibration error versus strong baselines.
The method is designed for deployment: it adds only small inference overhead (milliseconds per batch on CPU) and increases parameter count minimally (about 0.002% for decoder models and 0.5% for encoder models) without needing intermediate-state storage.
The authors argue that refining entropy via supervised alignment can yield more reliable uncertainty estimates while preserving the original backbone architecture, supporting large-scale practical use.

Abstract

Uncertainty estimation remains a key challenge when adapting pre-trained language models to downstream classification tasks, with overconfidence often observed for difficult inputs. While predictive entropy provides a strong baseline for uncertainty estimation, it considers mainly aleatoric uncertainty and has limited capacity to capture effects, such as class overlap or ambiguous linguistic cues. We introduce Aligned Entropy - ALIEN, a lightweight method that refines entropy-based uncertainty by aligning it with prediction reliability. ALIEN trains a small uncertainty head initialized to produce the model's original entropy and subsequently fine-tuned with two regularization mechanisms. Experiments across seven classification datasets and two NER benchmarks, evaluated on five language models (RoBERTa, ELECTRA, LLaMA-2, Qwen2.5, and Qwen3), show that ALIEN consistently outperforms strong baselines across all considered scenarios in detecting incorrect predictions, while achieving the lowest calibration error. The proposed method introduces only a small inference overhead (in the order of milliseconds per batch on CPU) and increases the model's parameter count by just 0.002% for decoder models and 0.5% for encoder models, without requiring storage of intermediate states. It improves uncertainty estimation while preserving the original model architecture, making the approach practical for large-scale deployment with modern language models. Our results demonstrate that entropy can be effectively refined through lightweight supervised alignment, producing more reliable uncertainty estimates without modifying the backbone model. The code is available at 4.