Detection of local geometry in random graphs: information-theoretic and computational limits

arXiv stat.ML / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies detection of a “local geometry” signal in a mixed random-graph model, where a hidden community is generated by a random geometric graph on the sphere while the rest follows Erdős–Rényi edges.
  • Because the model is constructed so that edge marginals match between the geometric and null cases, the only detectable signal comes from higher-order edge dependencies rather than individual edge probabilities.
  • The authors derive both information-theoretic and computational limits for detection, using signed triangle-count tests for upper bounds and advanced lower-bound techniques based on Wishart–GOE comparison and KL divergence tensorization.
  • They characterize the detection threshold (for fixed p) as d = Õ(k^2 ∨ k^6/n^3) and show how this extends previous results from the full model (k = n) to regimes with vanishing p.
  • On the algorithmic side, the work identifies a computational–statistical gap, provides evidence using the low-degree polynomial framework, and argues that signed cycle counts of length ℓ ≥ 4 are suboptimal.

Abstract

We study the problem of detecting local geometry in random graphs. We introduce a model \mathcal{G}(n, p, d, k), where a hidden community of average size k has edges drawn as a random geometric graph on \mathbb{S}^{d-1}, while all remaining edges follow the Erd\H{o}s--R\'enyi model \mathcal{G}(n, p). The random geometric graph is generated by thresholding inner products of latent vectors on \mathbb{S}^{d-1}, with each edge having marginal probability equal to p. This implies that \mathcal{G}(n, p, d, k) and \mathcal{G}(n, p) are indistinguishable at the level of the marginals, and the signal lies entirely in the edge dependencies induced by the local geometry. We investigate both the information-theoretic and computational limits of detection. On the information-theoretic side, our upper bounds follow from three tests based on signed triangle counts: a global test, a scan test, and a constrained scan test; our lower bounds follow from two complementary methods: truncated second moment via Wishart--GOE comparison, and tensorization of KL divergence. These results together settle the detection threshold at d = \widetilde{\Theta}(k^2 \vee k^6/n^3) for fixed p, and extend the state-of-the-art bounds from the full model (i.e., k = n) for vanishing p. On the computational side, we identify a computational--statistical gap and provide evidence via the low-degree polynomial framework, as well as the suboptimality of signed cycle counts of length \ell \geq 4.