Optimal Demixing of Nonparametric Densities

arXiv stat.ML / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies optimal demixing of nonparametric density functions when observed group-level densities are convex mixtures of unknown component densities.
It proposes a modified kernel density estimator that uses group-specific weights derived from a topic-modeling approach on histogram vectors, with an additional de-biasing step via U-statistics.
Under smoothness assumptions (Nikol’ski class with parameter β), the authors prove that the integrated squared error achieves a convergence rate depending on the number of groups n, mixture size K, dimension d, and per-group sample size N.
A matching lower bound is provided, indicating the proposed estimator is rate-optimal for the stated setting.
The work generalizes continuous-variable topic modeling and connects to applications in machine learning and LLMs that use word embeddings.

Abstract

Motivated by applications in statistics and machine learning, we consider a problem of unmixing convex combinations of nonparametric densities. Suppose we observe

n

groups of samples, where the

i

th group consists of

N_i

independent samples from a

d

-variate density

f_i(x)=\sum_{k=1}^K \pi_i(k)g_k(x)

. Here, each

g_k(x)

is a nonparametric density, and each

\pi_i

is a

K

-dimensional mixed membership vector. We aim to estimate

g_1(x), \ldots,g_K(x)

. This problem generalizes topic modeling from discrete to continuous variables and finds its applications in LLMs with word embeddings. In this paper, we propose an estimator for the above problem, which modifies the classical kernel density estimator by assigning group-specific weights that are computed by topic modeling on histogram vectors and de-biased by U-statistics. For any

\beta>0

, assuming that each

g_k(x)

is in the Nikol'ski class with a smooth parameter

\beta

, we show that the sum of integrated squared errors of the constructed estimators has a convergence rate that depends on

n

K

d

, and the per-group sample size

N

. We also provide a matching lower bound, which suggests that our estimator is rate-optimal.

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

Reddit r/MachineLearning

BYOK is not just a pricing model: why it changes AI product trust

Dev.to

AI Citation Registries and Identity Persistence Across Records

Dev.to

Building Real-Time AI Voice Agents with Google Gemini 3.1 Flash Live and VideoSDK

Dev.to

Your Knowledge, Your Model: A Method for Deterministic Knowledge Externalization

Dev.to

Optimal Demixing of Nonparametric Densities

Key Points

Abstract

Related Articles

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

BYOK is not just a pricing model: why it changes AI product trust

AI Citation Registries and Identity Persistence Across Records

Building Real-Time AI Voice Agents with Google Gemini 3.1 Flash Live and VideoSDK

Your Knowledge, Your Model: A Method for Deterministic Knowledge Externalization

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer