Clusterpath Gaussian Graphical Modeling

arXiv stat.ML / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces the Clusterpath estimator for Gaussian Graphical Models (CGGM), which uses an aggregation penalty to encourage data-driven clustering of variables in the graph structure.
  • By enforcing a clustered/block structure, CGGM produces a block-structured precision matrix while preserving a corresponding block structure in the covariance matrix, improving interpretability and controlling estimation uncertainty.
  • The estimator is posed as a convex optimization problem, enabling straightforward incorporation of additional penalization terms such as combinations of aggregation and sparsity.
  • A cyclic block coordinate descent algorithm is presented to compute CGGM efficiently, and simulations show it matches or outperforms existing state-of-the-art approaches for variable clustering.
  • The authors validate CGGM on multiple real empirical applications, highlighting practical advantages and versatility beyond synthetic benchmarks.

Abstract

Graphical models serve as effective tools for visualizing conditional dependencies between variables. However, as the number of variables grows, interpretation becomes increasingly difficult, and estimation uncertainty increases due to the large number of parameters relative to the number of observations. To address these challenges, we introduce the Clusterpath estimator of the Gaussian Graphical Model (CGGM) that encourages variable clustering in the graphical model in a data-driven way. Through the use of an aggregation penalty, we group variables together, which in turn results in a block-structured precision matrix whose block structure remains preserved in the covariance matrix. The CGGM estimator is formulated as the solution to a convex optimization problem, making it easy to incorporate other popular penalization schemes which we illustrate through the combination of an aggregation and sparsity penalty. We present a computationally efficient implementation of the CGGM estimator by using a cyclic block coordinate descent algorithm. In simulations, we show that CGGM not only matches, but oftentimes outperforms other state-of-the-art methods for variable clustering in graphical models. We also demonstrate CGGM's practical advantages and versatility on a diverse collection of empirical applications.