A Data-Informed Variational Clustering Framework for Noisy High-Dimensional Data

arXiv stat.ML / 4/9/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • DIVI is proposed as a variational clustering framework designed for noisy, high-dimensional data where only a small subset of features is informative and the cluster count is unknown.
  • The method combines global feature gating with split-based adaptive structure growth, learning feature relevance differentiably while using informative priors to stabilize optimization.
  • DIVI controls model complexity by expanding only when local diagnostics suggest underfitting, helping avoid instability and over-sensitivity to noisy dimensions common in likelihood-based approaches.
  • The work evaluates practical behavior by analyzing runtime scalability and parameter sensitivity, and reports competitive clustering performance with interpretable gating and conservative growth.
  • The authors also identify failure regimes and position DIVI as a practical variational approach rather than a fully Bayesian generative model.

Abstract

Clustering in high-dimensional settings with severe feature noise remains challenging, especially when only a small subset of dimensions is informative and the final number of clusters is not specified in advance. In such regimes, partition recovery, feature relevance learning, and structural adaptation are tightly coupled, and standard likelihood-based methods can become unstable or overly sensitive to noisy dimensions. We propose DIVI, a data-informed variational clustering framework that combines global feature gating with split-based adaptive structure growth. DIVI uses informative prior initialization to stabilize optimization, learns feature relevance in a differentiable manner, and expands model complexity only when local diagnostics indicate underfit. Beyond clustering performance, we also examine runtime scalability and parameter sensitivity in order to clarify the computational and practical behavior of the framework. Empirically, we find that DIVI performs competitively under severe feature noise, remains computationally feasible, and yields interpretable feature-gating behavior, while also exhibiting conservative growth and identifiable failure regimes in challenging settings. Overall, DIVI is best viewed as a practical variational clustering framework for noisy high-dimensional data rather than as a fully Bayesian generative solution.