Nonparametric Instrumental Regression via Kernel Methods is Minimax Optimal

arXiv stat.ML / 4/9/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes the kernel instrumental variable (KIV) algorithm, a two-stage, kernel-based approach to nonparametric instrumental variable regression, providing convergence guarantees for both identified and non-identified settings.
  • In the non-identified regime, it proves the KIV estimator converges to the minimum-norm instrumental-variable solution in the relevant RKHS, with convergence established in the strong L2 norm rather than only in a weaker pseudo-norm.
  • It characterizes statistical difficulty using a “link condition” that measures ill-posedness by comparing the covariance structure of the endogenous regressor to that implied by the instrument.
  • Under eigenvalue-decay and source assumptions, the authors derive strong L2 minimax-optimal learning rates over fixed smoothness classes and show a corresponding lower bound, highlighting an unavoidable slowdown versus ordinary kernel ridge regression.
  • It improves the first-stage of KIV by replacing the usual stage-1 Tikhonov regularization with general spectral regularization, which avoids saturation and can yield better rates for smoother first-stage targets.

Abstract

We study the kernel instrumental variable (KIV) algorithm, a kernel-based two-stage least-squares method for nonparametric instrumental variable regression. We provide a convergence analysis covering both identified and non-identified regimes: when the structural function is not identified, we show that the KIV estimator converges to the minimum-norm IV solution in the reproducing kernel Hilbert space associated with the kernel. Crucially, we establish convergence in the strong L_2 norm, rather than only in a pseudo-norm. We quantify statistical difficulty through a link condition that compares the covariance structure of the endogenous regressor with that induced by the instrument, yielding an interpretable measure of ill-posedness. Under standard eigenvalue-decay and source assumptions, we derive strong L_2 learning rates for KIV and prove that they are minimax-optimal over fixed smoothness classes. Finally, we replace the stage-1 Tikhonov step by general spectral regularization, thereby avoiding saturation and improving rates for smoother first-stage targets. The matching lower bound shows that instrumental regression induces an unavoidable slowdown relative to ordinary kernel ridge regression.