Measuring Differences between Conditional Distributions using Kernel Embeddings

arXiv stat.ML / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a unified theoretical framework for comparing conditional distributions using kernel embeddings in RKHS, introducing conditional maximum mean discrepancy (CMMD).
It defines a family of CMMD metrics (“levels”) including CMMD_0 (conditional mean operators), CMMD_1 (conditional mean embeddings), and CMMD_2 (joint mean embeddings), and further generalizes to level s.
The authors clarify assumptions and provide mathematical relationships between the levels by using an operator-based smoothing perspective.
They review existing estimators and introduce a new doubly robust estimator for CMMD that stays consistent when at least one of the underlying models is correctly specified.
Experiments show that CMMD can capture complex conditional dependencies and is effective for statistical testing of conditional distribution differences.

Abstract

Comparing conditional distributions is a fundamental challenge in statistics and machine learning, with applications across a wide range of domains. While proposed methods for measuring discrepancies using kernel embeddings of distributions in a reproducing kernel Hilbert space (RKHS) provide powerful non-parametric techniques, the existing literature remains fragmented and lacks a unified theoretical treatment. This paper addresses this gap by establishing a coherent framework for studying kernel-based methods to measure divergence between conditional distributions through what we refer to as conditional maximum mean discrepancy (CMMD). The CMMD consists of a family of metrics which we call levels, with three special cases each using a different type of RKHS embedding: CMMD

_0

(conditional mean operators), CMMD

_1

(conditional mean embeddings), and CMMD

_2

(joint mean embeddings). We additionally introduce a general level

s

CMMD, clarifying the required assumptions, and establishing mathematical connections between the levels through the lens of operator-based smoothing. In addition to reviewing previously proposed estimators, we introduce a novel doubly robust estimator for the CMMD that maintains consistency provided at least one of the underlying models is correctly specified. We provide numerical experiments demonstrating that the CMMD effectively captures complex conditional dependencies for statistical testing.

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

Dev.to

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

Dev.to

MCP annotations are a UX layer, not a security layer

Dev.to

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

Dev.to

Measuring Differences between Conditional Distributions using Kernel Embeddings

Key Points

Abstract

Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

MCP annotations are a UX layer, not a security layer

From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer