| I created a clustering algorithm SPORE (Skeleton Propagation Over Recalibrating Expansions) for general purpose clustering, intended to handle nonconvex, convex, low-d and high-d data alike. I've benchmarked it on 28 datasets from 2-784D and released a Python package as well as a research paper. Short SummarySPORE is a density-variance-based method meant for general clustering in arbitrary geometries and dimensionalities. After building a knn graph, it has 2 phases. Phase 1 (Expansion) uses BFS with a continually refined density-variance constraint to expand initial clusters in a way that adapts to their specific scale. The aim is to capture inner, well-shielded skeletons and stay back from low-separation boundary areas. Phase 2 (Small-Cluster Reassignment aka SCR) takes those boundary points and merges them into the skeletons they surround, and can draw sharp lines between adjacent cluster boundaries, kind of like kmeans partitioning to the nearest centroid/representative. So together, SPORE has scale-adaptive shape recognition capabilities and can draw sharp boundaries when clusters are near each other, so it can strongly resist the merge-or-fragment problem with most density based clustering algorithms. It's also pretty robust to dimensionality, all the way up to hundreds of dimensions. I’ve even used it on 1000D+ llm embeddings and gotten clean results (though to be fair, llm embeddings are often trained to be well-separated despite being high-D). More In-depthSPORE has 3 main steps, 2 of which are stages where the actual clustering occurs:
[link] [comments] |
[R] The SPORE Clustering Algorithm
Reddit r/MachineLearning / 4/1/2026
💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The article introduces SPORE, a general-purpose clustering algorithm designed to handle both nonconvex and convex cluster shapes across a wide range of dimensionalities.
- SPORE builds a k-NN graph (optionally using approximate neighbors via HNSW) and then runs two clustering phases: an expansion phase using BFS with a refined density-variance constraint, followed by a small-cluster reassignment step that merges boundary points into surrounding clusters.
- The method aims to avoid common failure modes of density-based clustering, particularly the merge-or-fragment problem, by staying away from low-separation boundary areas and drawing sharper lines between adjacent clusters.
- The author reports benchmarking on 28 datasets spanning 2 to 784 dimensions and claims robustness up to hundreds of dimensions, including a demonstration on very high-dimensional (1000D+) LLM embeddings.
- The work is shared as both a Python package on PyPI and an arXiv research paper, enabling others to reproduce and evaluate the approach.




