Sparse Weak-Form Discovery of Stochastic Generators

arXiv stat.ML / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a unified framework for identifying stochastic differential equations (SDEs) by combining weak-form integration-by-parts (Weak SINDy) with stochastic system identification (stochastic SINDy).
  • Its key innovation is using spatial Gaussian test functions instead of temporal ones, which ensures unbiased projected response noise by making each noise term have zero conditional mean given the current state.
  • The method reformulates SDE discovery into two sparse linear systems—one for drift and one for the diffusion tensor—solved jointly with shared design matrix via ℓ1-regularized regression and grouped cross-validation.
  • It includes a two-step bias-correction procedure to handle state-dependent diffusion, improving robustness when the diffusion varies with the state.
  • Experiments on benchmarks (Ornstein–Uhlenbeck, double-well Langevin, and multiplicative diffusion) report accurate recovery of generators with small coefficient errors (<4%), low stationary-density divergence (<0.01 TV distance), and correct relaxation timescales in autocorrelations.

Abstract

We introduce a framework for the data-driven discovery of stochastic differential equations (SDEs) that unifies, for the first time, the weak-form integration-by-parts approach of Weak SINDy with the stochastic system identification goal of stochastic SINDy. The central novelty is the adoption of spatial Gaussian test functions K_j(x)=\exp(-|x-x_j|^2/2h^2) in place of temporal test functions. Because the kernel weight K_j(X_{t_n}) is \mathcal{F}_{t_n}-measurable and the Brownian innovation \xi_n is independent of \mathcal{F}_{t_n}, every noise term in the projected response has zero conditional mean given the current state -- a property that guarantees unbiasedness in expectation and prevents the structural regression bias that afflicts temporal test functions in the stochastic setting. This design choice converts the SDE identification problem into two sparse linear systems -- one for the drift b(x) and one for the diffusion tensor a(x) -- that share a single design matrix and are solved jointly via \ell_1-regularised regression with grouped cross-validation. A two-step bias-correction procedure handles state-dependent diffusion. Validated on the Ornstein--Uhlenbeck process, the double-well Langevin system, and a multiplicative diffusion process, the method recovers all active polynomial generators with coefficient errors below 4\%, stationary-density total-variation distances below 0.01, and autocorrelation functions that faithfully reproduce true relaxation timescales across all three benchmarks.

Sparse Weak-Form Discovery of Stochastic Generators | AI Navigate