An effective variant of the Hartigan $k$-means algorithm

arXiv cs.LG / 4/24/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the classical k-means clustering problem, comparing the standard Lloyd’s algorithm with Hartigan’s algorithm, which typically performs better.
  • Prior work by Telgarsky and Vattani is cited for showing an improvement of roughly 5%–10% from Hartigan’s approach over Lloyd’s method.
  • The authors propose a very small variant of Hartigan’s algorithm that yields an additional 2%–5% improvement in clustering performance.
  • The reported gains tend to increase as the problem dimension and the number of clusters k become larger.
  • The work is presented as an arXiv announcement of a new version (v1), indicating an active research contribution rather than a completed industrial deployment.

Abstract

The k-means problem is perhaps the classical clustering problem and often synonymous with Lloyd's algorithm (1957). It has become clear that Hartigan's algorithm (1975) gives better results in almost all cases, Telgarsky-Vattani note a typical improvement of 5\% -- 10\%. We point out that a very minor variation of Hartigan's method leads to another 2\% -- 5\% improvement; the improvement tends to become larger when either dimension or k increase.