LOCO Feature Importance Inference without Data Splitting via Minipatch Ensembles

arXiv stat.ML / 2026/3/24

💬 オピニオンIdeas & Deep AnalysisModels & Research

共有:

要点

The paper proposes a mostly model-agnostic, distribution-free framework for feature importance inference using LOCO-style occlusion/leave-one-covariate-out ideas, but without requiring data splitting.
The core method uses “minipatch ensembles,” combining random observation and feature subsampling so inference can be performed with trained ensembles, avoiding model refitting and held-out test data.
It argues for both computational and statistical efficiency while addressing interpretability issues commonly caused by data-splitting procedures.
The authors provide theoretical results showing asymptotic validity of confidence intervals under mild assumptions, even when training and inference reuse the same data.
Practical theory-backed remedies are included for challenges like vanishing variance for null features and performing inference after data-driven hyperparameter tuning.

Abstract

Feature importance inference is critical for the interpretability and reliability of machine learning models. There has been increasing interest in developing model-agnostic approaches to interpret any predictive model, often in the form of feature occlusion or leave-one-covariate-out (LOCO) inference. Existing methods typically make limiting distributional assumptions, modeling assumptions, and require data splitting. In this work, we develop a novel, mostly model-agnostic, and distribution-free inference framework for feature importance in regression or classification tasks that does not require data splitting. Our approach leverages a form of random observation and feature subsampling called minipatch ensembles; it utilizes the trained ensembles for inference and requires no model-refitting or held-out test data after training. We show that our approach enjoys both computational and statistical efficiency as well as circumvents interpretational challenges with data splitting. Further, despite using the same data for training and inference, we show the asymptotic validity of our confidence intervals under mild assumptions. Additionally, we propose theory-supported solutions to critical practical issues including vanishing variance for null features and inference after data-driven tuning for hyperparameters. We demonstrate the advantages of our approach over existing methods on a series of synthetic and real data examples.