Entropy Alone is Insufficient for Safe Selective Prediction in LLMs

arXiv cs.CL / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

言語モデルのハルシネーションによる危害を減らすために、選択的予測（不確実な高リスク時に回答を棄権）を行う仕組みと、棄権判断に用いられる不確実性推定の評価の不足が指摘されています。
エントロピーだけに基づく不確実性手法には、モデル依存の失敗モードがあり、棄権行動が信頼できない挙動になり得ることを論文は明らかにしています。
その対策として、エントロピーに「正しさのプローブ（correctness probe）」の信号を組み合わせることで、棄権性能を改善できると提案しています。
TriviaQA・BioASQ・MedicalQAの3ベンチマークと4系統のモデルで、結合スコアはエントロピー単独よりリスク—カバレッジ特性やキャリブレーションを総じて向上させたと報告されています。
不確実性手法は、狙ったリスク水準で運用できるかどうかを直接反映する指標で、デプロイを見据えた評価が重要だと結論づけています。

Abstract

Selective prediction systems can mitigate harms resulting from language model hallucinations by abstaining from answering in high-risk cases. Uncertainty quantification techniques are often employed to identify such cases, but are rarely evaluated in the context of the wider selective prediction policy and its ability to operate at low target error rates. We identify a model-dependent failure mode of entropy-based uncertainty methods that leads to unreliable abstention behaviour, and address it by combining entropy scores with a correctness probe signal. We find that across three QA benchmarks (TriviaQA, BioASQ, MedicalQA) and four model families, the combined score generally improves both the risk--coverage trade-off and calibration performance relative to entropy-only baselines. Our results highlight the importance of deployment-facing evaluation of uncertainty methods, using metrics that directly reflect whether a system can be trusted to operate at a stated risk level.

The Security Gap in MCP Tool Servers (And What I Built to Fix It)

Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy

Reddit r/artificial

Why I Switched From GPT-4 to Small Language Models for Two of My Products

Dev.to

Orchestrating AI Velocity: Building a Decoupled Control Plane for Agentic Development

Dev.to

In the Kadrey v. Meta Platforms case, Judge Chabbria's quest to bust the fair use copyright defense to generative AI training rises from the dead!

Reddit r/artificial

Entropy Alone is Insufficient for Safe Selective Prediction in LLMs

Key Points

Abstract

Related Articles

The Security Gap in MCP Tool Servers (And What I Built to Fix It)

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy

Why I Switched From GPT-4 to Small Language Models for Two of My Products

Orchestrating AI Velocity: Building a Decoupled Control Plane for Agentic Development

In the Kadrey v. Meta Platforms case, Judge Chabbria's quest to bust the fair use copyright defense to generative AI training rises from the dead!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer