CRPS-Optimal Binning for Conformal Regression

arXiv stat.ML / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a non-parametric conditional distribution estimation method for conformal regression that partitions covariate-sorted observations into contiguous bins and uses each bin’s empirical CDF as the predictive distribution.
  • Bin boundaries are selected by minimizing leave-one-out Continuous Ranked Probability Score (LOO-CRPS), with a closed-form cost enabling O(n^2 log n) precomputation plus O(n^2) storage and global K-partition recovery via dynamic programming in O(n^2 K) time.
  • The authors show that selecting the number of bins K by minimizing within-sample LOO-CRPS is misleading due to in-sample optimism, and instead propose choosing K using an alternating held-out split that yields a U-shaped test-CRPS criterion with a clear minimum.
  • After choosing K*=, the method constructs a Venn prediction band and a conformal prediction set using CRPS as the nonconformity score, providing a finite-sample marginal coverage guarantee at any target error level ε.
  • Experiments on real benchmarks report substantially narrower prediction intervals than split-conformal baselines (Gaussian split conformal, CQR, CQR-QRF) while keeping coverage close to nominal.

Abstract

We propose a method for non-parametric conditional distribution estimation based on partitioning covariate-sorted observations into contiguous bins and using the within-bin empirical CDF as the predictive distribution. Bin boundaries are chosen to minimise the total leave-one-out Continuous Ranked Probability Score (LOO-CRPS), which admits a closed-form cost function with O(n^2 \log n) precomputation and O(n^2) storage; the globally optimal K-partition is recovered by a dynamic programme in O(n^2 K) time. Minimisation of Within-sample LOO-CRPS turns out to be inappropriate for selecting K as it results in in-sample optimism. So we instead select K by evaluating test CRPS on an alternating held-out split, which yields a U-shaped criterion with a well-defined minimum. Having selected K^* and fitted the full-data partition, we form two complementary predictive objects: the Venn prediction band and a conformal prediction set based on CRPS as the nonconformity score, which carries a finite-sample marginal coverage guarantee at any prescribed level \varepsilon. On real benchmarks against split-conformal competitors (Gaussian split conformal, CQR, and CQR-QRF), the method produces substantially narrower prediction intervals while maintaining near-nominal coverage.