Convolutional Maximum Mean Discrepancy for Inference in Noisy Data

arXiv stat.ML / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a new inference framework for data contaminated by measurement error, including potentially heteroscedastic noise drawn from a known distribution.
  • It introduces convolutional Maximum Mean Discrepancy (convMMD), which compares distributions after convolving with the noise while preserving metric validity under standard kernel assumptions.
  • The authors derive finite-sample deviation bounds that remain unaffected by measurement error and show an equivalence between hypothesis testing under noise and kernel smoothing.
  • They present a convMMD-based estimator with proofs of consistency and asymptotic normality, along with an efficient implementation using stochastic gradient descent.
  • Experiments and real-world applications (notably in astronomy and social sciences) demonstrate the method’s practical effectiveness under noisy observational settings.

Abstract

Modern data analyses frequently encounter settings where samples of variables are contaminated by measurement error. Ignoring measurement noise can substantially degrade statistical inference, while existing correction techniques are often computationally costly and inefficient. Recent advances in kernel methods, particularly those based on Maximum Mean Discrepancy (MMD), have enabled flexible, distribution-free inference, yet typically assume precise data and overlook contamination by measurement error. In this work, we introduce a novel framework for inference with samples corrupted by potentially heteroscedastic noise from a known distribution. Central to our approach is the convolutional MMD (convMMD), which compares distributions after noise convolution and retains metric validity under standard kernel conditions. We establish finite-sample deviation bounds that are unaffected by measurement error and prove an equivalence between testing under noise and kernel smoothing. Leveraging these insights, we introduce a convMMD-based estimator for inference with noisy, heteroscedastic observations. We establish its consistency and asymptotic normality, and provide an efficient implementation using stochastic gradient descent. We demonstrate the practical effectiveness of our approach through simulations and applications in astronomy and social sciences.