Distributional Shrinkage II: Higher-Order Scores Encode Brenier Map

arXiv stat.ML / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies the additive Gaussian observation model Y = X + σZ and develops a hierarchy of denoisers that use only higher-order score functions of the observed distribution Q, without needing the unknown signal law P.
  • For each order K, the denoiser T_K uses score information up to order 2K−1 and yields a Wasserstein error bound W_r(T_K#Q, P) = O(σ^{2(K+1)}), improving as K increases.
  • In the limit, the denoiser T_∞ converges to the monotone optimal transport map from Q to P (the Brenier map), showing that the Brenier map is encoded by these higher-order scores.
  • The authors give a full combinatorial characterization of the hierarchy via partial Bell polynomial recursions, clarifying how integer-partition structure relates higher-order Fisher-type quantities to optimal transport.
  • They also analyze how to estimate the required higher-order scores from n i.i.d. samples of Q using either plug-in kernel density estimation or higher-order score matching, with corresponding convergence rates.

Abstract

Consider the additive Gaussian model Y = X + \sigma Z, where X \sim P is an unknown signal, Z \sim N(0,1) is independent of X, and \sigma > 0 is known. Let Q denote the law of Y. We construct a hierarchy of denoisers T_0, T_1, \ldots, T_\infty \colon \mathbb{R} \to \mathbb{R} that depend only on higher-order score functions q^{(m)}/q, m \geq 1, of Q and require no knowledge of the law P. The K-th order denoiser T_K involves scores up to order 2K{-}1 and satisfies W_r(T_K \sharp Q, P) = O(\sigma^{2(K+1)}) for every r \geq 1; in the limit, T_\infty recovers the monotone optimal transport map (Brenier map) pushing Q onto P. We provide a complete characterization of the combinatorial structure governing this hierarchy through partial Bell polynomial recursions, making precise how higher-order score functions encode the Brenier map. We further establish rates of convergence for estimating these scores from n i.i.d.\ draws from Q under two complementary strategies: (i) plug-in kernel density estimation, and (ii) higher-order score matching. The construction reveals a precise interplay among higher-order Fisher-type information, optimal transport, and the combinatorics of integer partitions.