Measuring Differences between Conditional Distributions using Kernel Embeddings

arXiv stat.ML / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a unified theoretical framework for comparing conditional distributions using kernel embeddings in RKHS, introducing conditional maximum mean discrepancy (CMMD).
  • It defines a family of CMMD metrics (“levels”) including CMMD_0 (conditional mean operators), CMMD_1 (conditional mean embeddings), and CMMD_2 (joint mean embeddings), and further generalizes to level s.
  • The authors clarify assumptions and provide mathematical relationships between the levels by using an operator-based smoothing perspective.
  • They review existing estimators and introduce a new doubly robust estimator for CMMD that stays consistent when at least one of the underlying models is correctly specified.
  • Experiments show that CMMD can capture complex conditional dependencies and is effective for statistical testing of conditional distribution differences.

Abstract

Comparing conditional distributions is a fundamental challenge in statistics and machine learning, with applications across a wide range of domains. While proposed methods for measuring discrepancies using kernel embeddings of distributions in a reproducing kernel Hilbert space (RKHS) provide powerful non-parametric techniques, the existing literature remains fragmented and lacks a unified theoretical treatment. This paper addresses this gap by establishing a coherent framework for studying kernel-based methods to measure divergence between conditional distributions through what we refer to as conditional maximum mean discrepancy (CMMD). The CMMD consists of a family of metrics which we call levels, with three special cases each using a different type of RKHS embedding: CMMD_0 (conditional mean operators), CMMD_1 (conditional mean embeddings), and CMMD_2 (joint mean embeddings). We additionally introduce a general level s CMMD, clarifying the required assumptions, and establishing mathematical connections between the levels through the lens of operator-based smoothing. In addition to reviewing previously proposed estimators, we introduce a novel doubly robust estimator for the CMMD that maintains consistency provided at least one of the underlying models is correctly specified. We provide numerical experiments demonstrating that the CMMD effectively captures complex conditional dependencies for statistical testing.