K-Means as a Radial Basis function Network: a Variational and Gradient-based Equivalence [R]

Reddit r/MachineLearning / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper reformulates K-means as a continuous, fully differentiable optimization problem by replacing hard assignments with soft responsibilities and using a smooth clustering objective.
  • It proves a Γ(ガンマ)収束(Gamma convergence)結果 showing that the proposed smooth objective recovers standard K-means in the zero-temperature limit.
  • The authors argue that the usual alternating updates for K-means are not fundamental, but rather emerge naturally from the underlying variational formulation as the smoothing parameter vanishes.
  • The work establishes a precise equivalence with Radial Basis Function (RBF) networks, framing centers, assignments, and loss as parts of one unified objective where clustering vs. neural modeling differs mainly by the degree of smoothness.
  • The authors suggest this approach could embed clustering directly into larger end-to-end models without treating it as a separate training block, while noting that practical stability and usefulness remain unclear and inviting critique.

K Means is basically an RBF network

I have been working on a formulation of K Means as a continuous optimization problem instead of a discrete algorithm. The idea is to replace hard assignments with soft responsibilities and define a smooth objective that preserves the clustering structure while making the system fully differentiable and trainable end to end.

The main result is a Gamma convergence analysis showing that this objective recovers standard K Means in the zero temperature limit. So the usual alternating updates are not fundamental, they emerge from a continuous variational problem when the smoothing vanishes.

This also gives a precise connection with Radial Basis Function networks. Under this formulation, centers, assignments, and loss are part of the same objective, and the difference between clustering and a neural model is just the level of smoothness.

One thing I find interesting is that this removes the need to treat clustering as a separate block. In principle it can be embedded directly inside larger models and optimized jointly, although it is not obvious how stable or useful that is in practice.

I would be interested in critical feedback on both sides. On the theory side, whether the variational argument is actually tight or missing edge cases. On the practical side, whether this end to end view of clustering is something people would actually use or if standard K Means remains strictly better in real systems.

submitted by /u/Ffelixpe
[link] [comments]