On Model-Based Clustering With Entropic Optimal Transport

arXiv stat.ML / 5/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a new model-based clustering framework where optimizing log-likelihood is replaced by an entropic optimal transport (EOT)–based loss function.
  • It argues that the EOT loss shares the same global optimum as the original log-likelihood objective, but has a better-conditioned (more well-behaved) optimization landscape.
  • Because the log-likelihood objective is nonconvex and prone to many spurious local optima, the new approach aims to reduce the need for multiple random initializations.
  • The authors introduce and analyze a Sinkhorn-EM algorithm to optimize the EOT loss, showing convergence rates comparable to standard EM.
  • Extensive experiments and two real-world clustering applications (C. elegans microscopy image segmentation and spatial transcriptomics) show the EOT-based method outperforms log-likelihood optimization in practice.

Abstract

We develop a new methodology for model-based clustering. Optimizing the log-likelihood provides a principled statistical framework for clustering, with solutions found via the EM algorithm. However, because the log-likelihood is nonconvex, only convergence to stationary points can be guaranteed, and practitioners often use multiple starting points in the hope that one will converge to the global solution. We consider a new loss function based on entropic optimal transport that shares the same global optimum as the log-likelihood but has a much better-behaved landscape, thereby avoiding spurious local-optima configurations that are pervasive with the log-likelihood. Similar to the EM algorithm for the log-likelihood, this new loss can be optimized by the Sinkhorn-EM algorithm, which we show converges at a rate comparable to that of EM. By analyzing extensive numerical experiments and two real-world applications in image segmentation in C. elegans microscopy and clustering in spatial transcriptomics, we show that this new loss outperforms log-likelihood optimization, indicating that it represents a valuable clustering methodology for practitioners.