MEC: Machine-Learning-Assisted Generalized Entropy Calibration for Semi-Supervised Mean Estimation

arXiv stat.ML / 4/8/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • 本論文は、少量のラベルと大量の未ラベル共変量を前提に、予測器を用いたPrediction-powered inference(PPI)の効率低下やカバレッジ歪みの課題に対処する手法としてMEC(Machine-Learning-Assisted Generalized Entropy Calibration)を提案しています。
  • MECはクロスフィット+キャリブレーション重み付けにより、ラベル付きサンプルを目標母集団に合わせて再重み付けし、Bregman射影に基づく原理的なキャリブレーション枠組みを採用します。
  • 予測器に対するアフィン変換への頑健性を高め、妥当性条件を「生の予測誤差」ではなく「射影誤差」へ置き換えることで、従来より弱い仮定下での理論保証を実現します。
  • その結果、MECは既存のPPI系より弱い仮定で半パラメトリック効率境界に到達し、シミュレーションと実データ適用でほぼ公称のカバレッジとより狭い信頼区間を示しています。

Abstract

Obtaining high-quality labels is costly, whereas unlabeled covariates are often abundant, motivating semi-supervised inference methods with reliable uncertainty quantification. Prediction-powered inference (PPI) leverages a machine-learning predictor trained on a small labeled sample to improve efficiency, but it can lose efficiency under model misspecification and suffer from coverage distortions due to label reuse. We introduce Machine-Learning-Assisted Generalized Entropy Calibration (MEC), a cross-fitted, calibration-weighted variant of PPI. MEC improves efficiency by reweighting labeled samples to better align with the target population, using a principled calibration framework based on Bregman projections. This yields robustness to affine transformations of the predictor and relaxes requirements for validity by replacing conditions on raw prediction error with weaker projection-error conditions. As a result, MEC attains the semiparametric efficiency bound under weaker assumptions than existing PPI variants. Across simulations and a real-data application, MEC achieves near-nominal coverage and tighter confidence intervals than CF-PPI and vanilla PPI.