Uncertainty-Aware Transformers: Conformal Prediction for Language Models

arXiv cs.LG / 4/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces CONFIDE, an uncertainty quantification framework that applies conformal prediction to transformer-based language models to produce statistically valid prediction sets rather than single black-box outputs.
CONFIDE constructs class-conditional nonconformity scores using either [CLS] token embeddings or flattened hidden states for encoder-only models like BERT and RoBERTa, while also supporting hyperparameter tuning.
Experiments show CONFIDE can improve test accuracy by up to 4.09% on BERT-tiny and improves “correct efficiency” by reducing the expected size of prediction sets when the true label is included.
The method finds that earlier and intermediate transformer layers often provide better-calibrated and more semantically meaningful representations for conformal prediction.
The authors argue CONFIDE is especially useful for resource-constrained models and high-stakes tasks with ambiguous labels, where softmax-based uncertainty can be unreliable and where instance-level explanations are needed.

Abstract

Transformers have had a profound impact on the field of artificial intelligence, especially on large language models and their variants. However, as was the case with neural networks, their black-box nature limits trust and deployment in high-stakes settings. For models to be genuinely useful and trustworthy in critical applications, they must provide more than just predictions: they must supply users with a clear understanding of the reasoning that underpins their decisions. This article presents an uncertainty quantification framework for transformer-based language models. This framework, called CONFIDE (CONformal prediction for FIne-tuned DEep language models), applies conformal prediction to the internal embeddings of encoder-only architectures, like BERT and RoBERTa, while enabling hyperparameter tuning. CONFIDE uses either [CLS] token embeddings or flattened hidden states to construct class-conditional nonconformity scores, enabling statistically valid prediction sets with instance-level explanations. Empirically, CONFIDE improves test accuracy by up to 4.09% on BERT-tiny and achieves greater correct efficiency (i.e., the expected size of the prediction set conditioned on it containing the true label) compared to prior methods, including NM2 and VanillaNN. We show that early and intermediate transformer layers often yield better-calibrated and more semantically meaningful representations for conformal prediction. In resource-constrained models and high-stakes tasks with ambiguous labels, CONFIDE offers robustness and interpretability where softmax-based uncertainty fails. We position CONFIDE as a framework for practical diagnostic and efficiency/robustness improvement over prior conformal baselines.