Scalable Learning from Probability Measures with Mean Measure Quantization

arXiv stat.ML / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies statistical learning settings where each data item is represented as a probability measure and compares/manipulates them using optimal transport (OT).
  • To address OT’s prohibitive cost for measures with large support, it proposes a mean-measure quantization scheme that approximates every input measure by a K-point discrete measure on a shared support.
  • It proves consistency of the quantized representation and provides convergence guarantees for multiple OT-based downstream tasks computed from the quantized measures.
  • Experiments on both synthetic and real datasets show that the quantized approach achieves performance close to per-measure quantization while substantially improving runtime efficiency.

Abstract

We consider statistical learning problems in which data are observed as a set of probability measures. Optimal transport (OT) is a popular tool to compare and manipulate such objects, but its computational cost becomes prohibitive when the measures have large support. We study a quantization-based approach in which all input measures are approximated by K-point discrete measures sharing a common support. We establish consistency of the resulting quantized measures. We further derive convergence guarantees for several OT-based downstream tasks computed from the quantized measures. Numerical experiments on synthetic and real datasets demonstrate that the proposed approach achieves performance comparable to individual quantization while substantially reducing runtime.

Scalable Learning from Probability Measures with Mean Measure Quantization | AI Navigate