SymCircuit: Bayesian Structure Inference for Tractable Probabilistic Circuits via Entropy-Regularized Reinforcement Learning

arXiv cs.LG / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • SymCircuit addresses probabilistic circuit (PC) structure learning by replacing greedy, irreversible search with a learned generative policy trained using entropy-regularized reinforcement learning.
  • The work frames the approach as RL-as-inference, showing the optimal policy corresponds to a tempered Bayesian posterior and can recover the exact posterior when the temperature scales inversely with dataset size.
  • SymCircuit introduces SymFormer, a grammar-constrained autoregressive Transformer with tree-relative self-attention that guarantees valid circuit structures at every generation step.
  • Using option-level REINFORCE, the method updates gradients only for structural decisions, improving signal-to-noise and achieving over 10× sample efficiency on the NLTCS dataset.
  • The paper also develops a three-part uncertainty decomposition (structural, parametric, and leaf) tied to the multilinear polynomial structure of PC outputs, with SymCircuit closing 93% of the gap to LearnSPN and preliminary scalability results on Plants (69 variables).

Abstract

Probabilistic circuit (PC) structure learning is hampered by greedy algorithms that make irreversible, locally optimal decisions. We propose SymCircuit, which replaces greedy search with a learned generative policy trained via entropy-regularized reinforcement learning. Instantiating the RL-as-inference framework in the PC domain, we show the optimal policy is a tempered Bayesian posterior, recovering the exact posterior when the regularization temperature is set inversely proportional to the dataset size. The policy is implemented as SymFormer, a grammar-constrained autoregressive Transformer with tree-relative self-attention that guarantees valid circuits at every generation step. We introduce option-level REINFORCE, restricting gradient updates to structural decisions rather than all tokens, yielding an SNR (signal to noise ratio) improvement and >10 times sample efficiency gain on the NLTCS dataset. A three-layer uncertainty decomposition (structural via model averaging, parametric via the delta method, leaf via conjugate Dirichlet-Categorical propagation) is grounded in the multilinear polynomial structure of PC outputs. On NLTCS, SymCircuit closes 93% of the gap to LearnSPN; preliminary results on Plants (69 variables) suggest scalability.