Fast and principled equation discovery from chaos to climate

arXiv cs.LG / 4/15/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Bayesian-ARGOS, a hybrid framework for automated equation discovery from noisy, limited observations that unifies fast sparse-regression screening with focused Bayesian inference and uncertainty quantification.
  • Experiments on seven chaotic systems show Bayesian-ARGOS beats two state-of-the-art baselines in most settings, improving data efficiency over SINDy and reducing computational cost by about two orders of magnitude versus bootstrap-based ARGOS.
  • The Bayesian formulation enables standard statistical diagnostics such as influence analysis and multicollinearity detection, helping reveal failure modes that are difficult to detect with purely library-based sparse regression.
  • When combined with representation learning (SINDy-SHRED), Bayesian-ARGOS improves the yield of valid latent equations and provides better long-horizon stability for high-dimensional sea-surface-temperature reconstruction tied to climate dynamics.

Abstract

Our ability to predict, control, and ultimately understand complex systems rests on discovering the equations that govern their dynamics. Identifying these equations directly from noisy, limited observations has therefore become a central challenge in data-driven science, yet existing library-based sparse regression methods force a compromise between automation, statistical rigor, and computational efficiency. Here we develop Bayesian-ARGOS, a hybrid framework that reconciles these demands by combining rapid frequentist screening with focused Bayesian inference, enabling automated equation discovery with principled uncertainty quantification at a fraction of the computational cost of existing methods. Tested on seven chaotic systems under varying data scarcity and noise levels, Bayesian-ARGOS outperforms two state-of-the-art methods in most scenarios. It surpasses SINDy in data efficiency for all systems and noise tolerance for six out of the seven, with a two-order-of-magnitude reduction in computational cost compared to bootstrap-based ARGOS. The probabilistic formulation additionally enables a suite of standard statistical diagnostics, including influence analysis and multicollinearity detection that expose failure modes otherwise opaque. When integrated with representation learning (SINDy-SHRED) for high dimensional sea surface temperature reconstruction, Bayesian-ARGOS increases the yield of valid latent equations with significantly improved long horizon stability. Bayesian-ARGOS thus provides a principled, automated, and computationally efficient route from scarce and noisy observations to interpretable governing equations, offering a practical framework for equation discovery across scales, from benchmark chaotic systems to the latent dynamics underlying global climate patterns.