Symbolic Density Estimation: A Decompositional Approach

arXiv cs.LG / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces AI-Kolmogorov, a framework for Symbolic Density Estimation (SymDE) aimed at generating interpretable symbolic expressions for probability densities rather than only point predictions.
  • It proposes a multi-stage pipeline that decomposes the problem (via clustering and/or probabilistic graphical model structure learning), performs nonparametric density estimation, estimates support, and then applies symbolic regression to the estimated density.
  • The approach is evaluated on synthetic mixture models, multivariate normal distributions, and multiple nonstandard “exotic” distributions, including two distributions motivated by high-energy physics use cases.
  • Results indicate the method can either recover underlying component distributions or produce mathematically meaningful symbolic expressions that provide insight into the data-generating processes.

Abstract

We introduce AI-Kolmogorov, a novel framework for Symbolic Density Estimation (SymDE). Symbolic regression (SR) has been effectively used to produce interpretable models in standard regression settings but its applicability to density estimation tasks has largely been unexplored. To address the SymDE task we introduce a multi-stage pipeline: (i) problem decomposition through clustering and/or probabilistic graphical model structure learning; (ii) nonparametric density estimation; (iii) support estimation; and finally (iv) SR on the density estimate. We demonstrate the efficacy of AI-Kolmogorov on synthetic mixture models, multivariate normal distributions, and three exotic distributions, two of which are motivated by applications in high-energy physics. We show that AI-Kolmogorov can discover underlying distributions or otherwise provide valuable insight into the mathematical expressions describing them.