Matrix Factorization Framework for Community Detection under the Degree-Corrected Block Model

arXiv stat.ML / 4/29/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper reframes community detection under the degree-corrected block model (DCBM) as a constrained nonnegative matrix factorization (NMF) problem, enabling a new inference approach.
  • It introduces a novel community detection method along with a theoretically grounded initialization strategy that supplies a strong starting estimate for downstream DCBM inference algorithms.
  • The method is designed to be agnostic to specific graph structures, provided the network can be represented by a DCBM, making it broadly applicable.
  • Experiments on synthetic and real benchmarks show the approach achieves community detection results comparable to standard DCBM inference while running much faster (e.g., ~4 minutes for 100,000 nodes and 1,000,000 edges).
  • The proposed initialization improves solution quality and reduces iteration counts across all tested inference algorithms, addressing both robustness and computational efficiency concerns.

Abstract

Community detection is a fundamental task in data analysis, and block models provide an approach for identifying a wide variety of community structures while offering high interpretability. The degree-corrected block model (DCBM) is an established model that accounts for the heterogeneity of node degrees. However, inference methods are computationally costly and highly sensitive to initialization, while cheaper alternatives, such as spectral or modularity-based approaches, are restricted to detecting specific structures, typically assortative. In this work, we show that DCBM inference can be reformulated as a constrained nonnegative matrix factorization problem. Leveraging this insight, we propose a novel method for community detection and a theoretically well-grounded initialization strategy that provides an initial estimate of communities for inference algorithms. Our approach is agnostic to any specific network structure and applies to graphs with any structure representable by a DCBM. Experiments on synthetic and real benchmark networks show that our method detects communities comparable to those found by DCBM inference while being faster; for instance, it processes a graph with 100,000 nodes and 1,000,000 edges in approximately 4 minutes. Moreover, the proposed initialization strategy significantly improves solution quality and reduces the number of iterations required by all tested inference algorithms. Overall, this work provides a scalable and robust framework for community detection and highlights the benefits of a matrix-factorization perspective for the DCBM.