Distributionally Robust K-Means Clustering

arXiv stat.ML / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses K-means’ brittleness to outliers, distribution shift, and small samples by reframing it as Lloyd–Max quantization of the empirical distribution.
It introduces a distributionally robust K-means objective by assuming the true population distribution lies within a Wasserstein-2 ball around the empirical distribution and solving a minimax problem for worst-case expected squared distances.
The authors derive a tractable dual that yields a soft-clustering method with smoothly weighted assignments instead of hard assignments.
They present an efficient block coordinate descent algorithm with proven monotonic decrease of the objective and local linear convergence guarantees.
Experiments on benchmarks and large synthetic datasets show improved robustness and better outlier detection under noise and distributional perturbations.

Abstract

K-means clustering is a workhorse of unsupervised learning, but it is notoriously brittle to outliers, distribution shifts, and limited sample sizes. Viewing k-means as Lloyd--Max quantization of the empirical distribution, we develop a distributionally robust variant that protects against such pathologies. We posit that the unknown population distribution lies within a Wasserstein-2 ball around the empirical distribution. In this setting, one seeks cluster centers that minimize the worst-case expected squared distance over this ambiguity set, leading to a minimax formulation. A tractable dual yields a soft-clustering scheme that replaces hard assignments with smoothly weighted ones. We propose an efficient block coordinate descent algorithm with provable monotonic decrease and local linear convergence. Experiments on standard benchmarks and large-scale synthetic data demonstrate substantial gains in outlier detection and robustness to noise.

Don't forget, there is more than forgetting: new metrics for Continual Learning

Dev.to

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Dev.to

Bit of a strange question?

Reddit r/artificial

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

Dev.to

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

Dev.to

Distributionally Robust K-Means Clustering

Key Points

Abstract

Related Articles

Don't forget, there is more than forgetting: new metrics for Continual Learning

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Bit of a strange question?

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer