Perturbing the Derivative: Doubly Wild Refitting for Model-Free Evaluation of Opaque Machine Learning Predictors

arXiv stat.ML / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses excess risk evaluation for empirical risk minimization (ERM) under convex losses, proposing a model-free approach that avoids using the global structure of the hypothesis class.
  • It leverages “wild refitting” to produce “wild optimism” bounds by constructing two pseudo-outcome datasets via stochastic derivative perturbations with tuned scaling.
  • Using only black-box access to the training algorithm and a single dataset (under a fixed design setting), the method refits the black box twice to obtain two wild predictors.
  • The resulting framework yields an efficient upper bound on excess risk without requiring prior knowledge of the function class complexity, aiming to better support evaluation of opaque deep neural networks and generative models.
  • The work is positioned as promising for theoretical evaluation where traditional learning-theory analyses can be infeasible for extremely complex modern models.

Abstract

We study the problem of excess risk evaluation for empirical risk minimization (ERM) under convex losses. We show that by leveraging the idea of wild refitting, one can upper bound the excess risk through the so-called "wild optimism," without relying on the global structure of the underlying function class but only assuming black box access to the training algorithm and a single dataset. We begin by generating two sets of artificially modified pseudo-outcomes created by stochastically perturbing the derivatives with carefully chosen scaling. Using these pseudo-labeled datasets, we refit the black-box procedure twice to obtain two wild predictors and derive an efficient excess risk upper bound under the fixed design setting. Requiring no prior knowledge of the complexity of the underlying function class, our method is essentially model-free and holds significant promise for theoretically evaluating modern opaque deep neural networks and generative models, where traditional learning theory could be infeasible due to the extreme complexity of the hypothesis class.