A General Representation-Based Approach to Multi-Source Domain Adaptation

arXiv cs.LG / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies unsupervised multi-source domain adaptation, focusing on which information should be transferred from labeled sources to an unlabeled target domain.
  • It argues that many existing deep latent-representation methods depend on restrictive identifiability assumptions that limit performance in real-world settings.
  • The authors propose learning compact latent representations tied to the prediction task, and show that using all predictive information (the label’s Markov blanket) can be underdetermined in general cases.
  • They provide a key insight for identifiability by partitioning Markov blanket representations into label parents, children, and spouses, enabling a general adaptation framework.
  • Building on the theory, the work introduces a practical nonparametric domain adaptation method designed to handle different kinds of distribution shifts.

Abstract

A central problem in unsupervised domain adaptation is determining what to transfer from labeled source domains to an unlabeled target domain. To handle high-dimensional observations (e.g., images), a line of approaches use deep learning to learn latent representations of the observations, which facilitate knowledge transfer in the latent space. However, existing approaches often rely on restrictive assumptions to establish identifiability of the joint distribution in the target domain, such as independent latent variables or invariant label distributions, limiting their real-world applicability. In this work, we propose a general domain adaptation framework that learns compact latent representations to capture distribution shifts relative to the prediction task and address the fundamental question of what representations should be learned and transferred. Notably, we first demonstrate that learning representations based on all the predictive information, i.e., the label's Markov blanket in terms of the learned representations, is often underspecified in general settings. Instead, we show that, interestingly, general domain adaptation can be achieved by partitioning the representations of Markov blanket into those of the label's parents, children, and spouses. Moreover, its identifiability guarantee can be established. Building on these theoretical insights, we develop a practical, nonparametric approach for domain adaptation in a general setting, which can handle different types of distribution shifts.