[R] VLouvain: Louvain Community Detection Directly on Vectors, No Graph Construction

Reddit r/MachineLearning / 3/24/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • VLouvain re-derives Louvain community detection to operate directly on an embedding matrix, avoiding explicit similarity-graph construction and the resulting O(n^2) edge explosion.
  • The method computes degree and modularity gains from community-level vector sums, achieving O(n·d) state while remaining mathematically identical to standard Louvain.
  • Benchmarks on Amazon Products (1.57M nodes, d=200) report ~11,300s runtime for VLouvain, while multiple graph/parallel libraries fail well below that scale.
  • The paper finds that Top-K similarity sparsification with FAISS does not preserve meaningful structure (very low NMI vs the full graph), implying naive truncation yields near-random communities.
  • As a GraphRAG drop-in, VLouvain reduces graph construction/indexing time from ~3 hours to ~5.3 minutes and improves MultiHopRAG retrieval recall from 37.9% to 48.8%.

You have embeddings for your objects. You want to build a similarity graph and find communities, whether for GraphRAG, a recommender system, or just finding structure in your data. So you compute pairwise similarities, build the graph, run Louvain. Except now you have O(n^2) edges and everything crashes above ~15K nodes.

VLouvain reformulates Louvain to work directly on the embedding matrix. Degrees and modularity gains are computed from community-level vector sums, no edges involved. You maintain O(n*d) state instead of O(n^2). The result is mathematically identical to standard Louvain, not an approximation.

On Amazon Products (1.57M nodes, d=200), VLouvain completes in ~11,300 seconds. Every other method we tested (cuGraph, iGraph, GVE, NetworKit) fails before reaching half that scale.

One thing we didn't expect: Top-K sparsification doesn't save you. We built exact and approximate Top-K graphs via FAISS, and even at K=256 the partitions had NMI ~0.04 against the full graph. If you're truncating your similarity graph to make Louvain feasible, you're getting back essentially random communities.

As a drop-in replacement for graph construction in GraphRAG, indexing went from 3 hours to 5.3 minutes, retrieval recall improved from 37.9% to 48.8% on MultiHopRAG.

Paper (EDBT 2026): https://openproceedings.org/2026/conf/edbt/paper-72.pdf

Code: https://github.com/yutengkai/VLouvain

submitted by /u/Greedy-Teach1533
[link] [comments]