Conformal Selective Prediction with General Risk Control

arXiv cs.LG / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces SCoRE (Selective Conformal Risk control with E-values), a framework for selective prediction that lets AI systems abstain when they are uncertain while still enforcing strict error/risk control on the subset of trusted predictions.
SCoRE builds generalized non-negative e-values using conformal inference and hypothesis testing concepts, guaranteeing (via data exchangeability) that the e-value–weighted unknown risk has expectation bounded by one.
The framework converts these e-values into binary “trust” decisions, providing finite-sample guarantees on risk among the positive (trusted) cases without relying on uniform concentration assumptions.
The method is designed to be model-agnostic and supports user-defined bounded continuous risk, with potential extension to distribution-shift scenarios.
Experiments via simulations and applications to drug discovery, health risk prediction, and large language models show the approach’s effectiveness for error management where abstention and reliability are critical.

Abstract

In deploying artificial intelligence (AI) models, selective prediction offers the option to abstain from making a prediction when uncertain about model quality. To fulfill its promise, it is crucial to enforce strict and precise error control over cases where the model is trusted. We propose Selective Conformal Risk control with E-values (SCoRE), a new framework for deriving such decisions for any trained model and any user-defined, bounded and continuously-valued risk. SCoRE offers two types of guarantees on the risk among ``positive'' cases in which the system opts to trust the model. Built upon conformal inference and hypothesis testing ideas, SCoRE first constructs a class of (generalized) e-values, which are non-negative random variables whose product with the unknown risk has expectation no greater than one. Such a property is ensured by data exchangeability without requiring any modeling assumptions. Passing these e-values on to hypothesis testing procedures, we yield the binary trust decisions with finite-sample error control. SCoRE avoids the need of uniform concentration, and can be readily extended to settings with distribution shifts. We evaluate the proposed methods with simulations and demonstrate their efficacy through applications to error management in drug discovery, health risk prediction, and large language models.