First-Order Efficiency for Probabilistic Value Estimation via A Statistical Viewpoint

arXiv stat.ML / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Probabilistic value estimation methods such as Shapley values and semivalues offer model-agnostic interpretability and data valuation, but exact computation is infeasible due to the exponential number of coalitions.
  • The paper identifies a unifying first-order error structure across several existing Monte Carlo estimators, showing the leading term is an augmented inverse-probability weighted influence term shaped by the sampling law and a chosen surrogate function.
  • It derives an explicit expression for the leading mean squared error (MSE), clarifying how statistical efficiency depends jointly on the sampling strategy and the surrogate.
  • Based on this first-order MSE criterion, the authors propose EASE (Efficiency-Aware Surrogate-adjusted Estimator), which selects the sampling law and surrogate to minimize the first-order MSE.
  • Experiments indicate that EASE consistently outperforms state-of-the-art estimators across multiple probabilistic value estimation tasks.

Abstract

Probabilistic values, including Shapley values and semivalues, provide a model-agnostic framework to attribute the behavior of a black-box model to data points or features, with a wide range of applications including explainable artificial intelligence and data valuation. However, their exact computation requires utility evaluations over exponentially many coalitions, making Monte Carlo approximation essential in modern machine learning applications. Existing estimators are often developed through different identification strategies, including weighted averages, self-normalized weighting, regression adjustment, and weighted least squares. Our key observation is that these seemingly distinct constructions share a common first-order error structure, in which the leading term is an augmented inverse-probability weighted influence term determined by the sampling law and a working surrogate function. This first-order representation yields an explicit expression for the leading mean squared error (MSE), which characterizes how the sampling law and the surrogate jointly determine statistical efficiency. Guided by this criterion, we propose an Efficiency-Aware Surrogate-adjusted Estimator (EASE) that directly chooses the sampling law and surrogate to minimize the first-order MSE. We demonstrate that EASE consistently outperforms state-of-the-art estimators for various probabilistic values.