[P] EVōC: Embedding Vector Oriented Clustering

Reddit r/MachineLearning / 4/1/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • EVōC is a new Python library focused on clustering very high-dimensional embedding vectors, addressing common bottlenecks in both cluster quality and compute time.
  • The approach is built on UMAP and HDBSCAN, with components redesigned, tuned, and optimized specifically for embedding-vector clustering.
  • The article claims EVōC can produce better clustering quality than a typical UMAP + HDBSCAN pipeline while running in a fraction of the time.
  • EVōC is also reported to be competitive in scaling performance with scikit-learn’s MiniBatchKMeans.
  • The project is available via GitHub, documentation, and PyPI for immediate use and experimentation.

I have written a new library specifically targeting the problem of clustering for embedding vectors. This is often a challenging task, as embedding vectors are very high dimensional, and classical clustering algorithms can struggle to perform well (either in terms of cluster quality, or compute time performance) because of that.

EVōC builds from foundations such as UMAP and HDBSCAN, redesigned, tuned and optimized specifically to the task of clustering embedding vectors. If you use UMAP + HDBSCAN for embedding vector clustering now, EVōC can provide better quality results in a fraction of the time. In fact EVōC is performance competitive in scaling with sklearn's MiniBatchKMeans.

Github: https://github.com/TutteInstitute/evoc

Docs: https://evoc.readthedocs.io

PyPI: https://pypi.org/project/evoc/

submitted by /u/lmcinnes
[link] [comments]