Calibrated Principal Component Regression

arXiv stat.ML / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Calibrated Principal Component Regression (CPCR) as a new inference method for generalized linear models in overparameterized settings.
  • It addresses Principal Component Regression (PCR)’s key drawback—truncation bias—by learning a low-variance prior inside the principal-component subspace and then calibrating back to the original feature space using a centered Tikhonov step.
  • CPCR uses cross-fitting and softens PCR’s hard cutoff to better control truncation bias compared with standard PCR.
  • The authors derive out-of-sample risk bounds in the random matrix regime and show CPCR can outperform PCR when the true regression signal has meaningful components in low-variance directions.
  • Experiments across multiple overparameterized problems indicate CPCR provides consistent prediction improvements and demonstrates stability and flexibility.

Abstract

We propose a new method for statistical inference in generalized linear models. In the overparameterized regime, Principal Component Regression (PCR) reduces variance by projecting high-dimensional data to a low-dimensional principal subspace before fitting. However, PCR incurs truncation bias whenever the true regression vector has mass outside the retained principal components (PC). To mitigate the bias, we propose Calibrated Principal Component Regression (CPCR), which first learns a low-variance prior in the PC subspace and then calibrates the model in the original feature space via a centered Tikhonov step. CPCR leverages cross-fitting and controls the truncation bias by softening PCR's hard cutoff. Theoretically, we calculate the out-of-sample risk in the random matrix regime, which shows that CPCR outperforms standard PCR when the regression signal has non-negligible components in low-variance directions. Empirically, CPCR consistently improves prediction across multiple overparameterized problems. The results highlight CPCR's stability and flexibility in modern overparameterized settings.