Pseudo-Labeling for Unsupervised Domain Adaptation with Kernel GLMs

arXiv stat.ML / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Proposes a principled framework for unsupervised domain adaptation under covariate shift in kernel GLMs, covering kernelized linear, logistic, and Poisson regression with ridge regularization.
  • Splits labeled source data into two batches: one to train a family of candidate models and one to build an imputation model that generates pseudo-labels for the target data, enabling robust model selection.
  • Establishes non-asymptotic excess-risk bounds characterized by an 'effective labeled sample size' that accounts for unknown covariate shift, providing theoretical guarantees.
  • Demonstrates empirical gains over source-only baselines on synthetic and real datasets, validating the approach.

Abstract

We propose a principled framework for unsupervised domain adaptation under covariate shift in kernel Generalized Linear Models (GLMs), encompassing kernelized linear, logistic, and Poisson regression with ridge regularization. Our goal is to minimize prediction error in the target domain by leveraging labeled source data and unlabeled target data, despite differences in covariate distributions. We partition the labeled source data into two batches: one for training a family of candidate models, and the other for building an imputation model. This imputation model generates pseudo-labels for the target data, enabling robust model selection. We establish non-asymptotic excess-risk bounds that characterize adaptation performance through an "effective labeled sample size", explicitly accounting for the unknown covariate shift. Experiments on synthetic and real datasets demonstrate consistent performance gains over source-only baselines.