Distributional Shrinkage I: Universal Denoiser Beyond Tweedie's Formula

arXiv stat.ML / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies distributional denoising when the noise level is known but the noise distribution is unknown, using the observation model Y = X + σZ with known σ ∈ (0,1).
  • It proposes “universal” denoisers that are agnostic to both the signal distribution PX and the noise distribution, aiming to recover the distribution PX from PY rather than denoise individual samples.
  • Compared with the Bayes-optimal Tweedie denoiser (with O(σ^2) accuracy), the authors report order-of-magnitude improvements in distributional recovery, including O(σ^4) and O(σ^6) accuracy for generalized moments and density matching.
  • The proposed denoisers modify the standard shrinkage rule (scaling the σ^2 ∇log q(y) term and adding higher-order corrections) and are derived using optimal transport ideas that approximate the Monge–Ampère equation.
  • The method can be implemented efficiently via score matching, leveraging gradients of the log density of PY.

Abstract

We study the problem of denoising when only the noise level is known, not the noise distribution. Independent noise Z corrupts a signal X, yielding the observation Y = X + \sigma Z with known \sigma \in (0,1). We propose \emph{universal} denoisers, agnostic to both signal and noise distributions, that recover the signal distribution P_X from P_Y. When the focus is on distributional recovery of P_X rather than on individual realizations of X, our denoisers achieve order-of-magnitude improvements over the Bayes-optimal denoiser derived from Tweedie's formula, which achieves O(\sigma^2) accuracy. They shrink P_Y toward P_X with O(\sigma^4) and O(\sigma^6) accuracy in matching generalized moments and densities. Drawing on optimal transport theory, our denoisers approximate the Monge--Amp\`ere equation with higher-order accuracy and can be implemented efficiently via score matching. Let q denote the density of P_Y. For distributional denoising, we propose replacing the Bayes-optimal denoiser,
\mathbf{T}^*(y) = y + \sigma^2 
abla \log q(y),
with denoisers exhibiting less-aggressive distributional shrinkage,
\mathbf{T}_1(y) = y + \frac{\sigma^2}{2} 
abla \log q(y),
\mathbf{T}_2(y) = y + \frac{\sigma^2}{2} 
abla \log q(y) - \frac{\sigma^4}{8} 
abla \!\left( \frac{1}{2} \| 
abla \log q(y) \|^2 + 
abla \cdot 
abla \log q(y) \right)\!.