Statistical Inference for Explainable Boosting Machines

arXiv stat.ML / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses a key limitation of Explainable Boosting Machines (EBMs): while feature effect visualizations provide interpretability, uncertainty quantification for those learned univariate functions has been computationally expensive due to bootstrapping.
It introduces statistical inference methods for gradient boosting and provides end-to-end theoretical guarantees, enabling uncertainty estimates without resorting to costly resampling.
By using moving averages instead of a sum of trees via “Boulevard regularization,” the boosting procedure is shown to converge to a feature-wise kernel ridge regression, yielding asymptotically normal predictions.
The authors derive minimax-optimal mean-squared error rates for fitting Lipschitz generalized additive models (GAMs), including results that avoid the curse of dimensionality.
The work also constructs prediction intervals and feature-level confidence intervals with runtime that does not depend on the number of datapoints, and releases accompanying code on GitHub.

Abstract

Explainable boosting machines (EBMs) are popular "glass-box" models that learn a set of univariate functions using boosting trees. These achieve explainability through visualizations of each feature's effect. However, unlike linear model coefficients, uncertainty quantification for the learned univariate functions requires computationally intensive bootstrapping, making it hard to know which features truly matter. We provide an alternative using recent advances in statistical inference for gradient boosting, deriving methods for statistical inference as well as end-to-end theoretical guarantees. Using a moving average instead of a sum of trees (Boulevard regularization) allows the boosting process to converge to a feature-wise kernel ridge regression. This produces asymptotically normal predictions that achieve the minimax-optimal MSE for fitting Lipschitz GAMs with

p

features of

O(p n^{-2/3})

, successfully avoiding the curse of dimensionality. We then construct prediction intervals for the response and confidence intervals for each learned univariate function with a runtime independent of the number of datapoints, enabling further explainability within EBMs. Code is available at https://github.com/hetankevin/ebm-inference.