Adaptive Conformal Prediction for Improving Factuality of Generations by Large Language Models

arXiv cs.CL / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses a key limitation of current conformal prediction methods for LLM factuality: they are often not prompt-adaptive, so uncertainty/calibration does not properly reflect input-dependent variability.
It proposes an adaptive conformal prediction framework that extends conformal score transformation for LLMs, enabling prompt-dependent calibration while preserving marginal coverage guarantees.
The method improves conditional coverage, particularly for long-form generation and multiple-choice question answering, where factuality risk varies with the prompt.
It supports selective prediction by filtering unreliable claims or answer choices before downstream use.
Experiments on multiple white-box LLMs and domains show significant gains over existing baselines in conditional coverage metrics.

Abstract

Large language models (LLMs) are prone to generating factually incorrect outputs. Recent work has applied conformal prediction to provide uncertainty estimates and statistical guarantees for the factuality of LLM generations. However, existing approaches are typically not prompt-adaptive, limiting their ability to capture input-dependent variability. As a result, they may filter out too few items (leading to over-coverage) or too many (under-coverage) for a given task or prompt. We propose an adaptive conformal prediction approach that extends conformal score transformation methods to LLMs, with applications to long-form generation and multiple-choice question answering. This enables prompt-dependent calibration, retaining marginal coverage guarantees while improving conditional coverage. In addition, the approach naturally supports selective prediction, allowing unreliable claims or answer choices to be filtered out in downstream applications. We evaluate our approach on multiple white-box models across diverse domains and show that it significantly outperforms existing baselines in terms of conditional coverage.

The AI Hype Cycle Is Lying to You About What to Learn

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Inside NVIDIA’s $2B Marvell Deal: What NVLink Fusion Means for AI Ethernet Fabrics

Dev.to

Automating Your Literature Review: From PDFs to Data with AI

Dev.to

Why event-driven agents reduce scope, cost, and decision dispersion

Dev.to

Adaptive Conformal Prediction for Improving Factuality of Generations by Large Language Models

Key Points

Abstract

Related Articles

The AI Hype Cycle Is Lying to You About What to Learn

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Inside NVIDIA’s $2B Marvell Deal: What NVLink Fusion Means for AI Ethernet Fabrics

Automating Your Literature Review: From PDFs to Data with AI

Why event-driven agents reduce scope, cost, and decision dispersion

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer