Rethinking Forward Processes for Score-Based Data Assimilation in High Dimensions

arXiv stat.ML / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses data assimilation in high-dimensional dynamical systems, where classical Bayesian filters can be inaccurate or computationally infeasible.
  • It critiques prior score-based generative filtering approaches for specifying the forward process independently of the measurement model, forcing reliance on heuristic likelihood-score approximations that can accumulate errors.
  • It proposes a measurement-aware score-based filter (MASF) that constructs a measurement-aware forward process directly from the measurement equation, making the likelihood score analytically tractable.
  • For linear measurement settings, the method derives the exact likelihood score and combines it with a learned prior score to compute posterior scores for filtering.
  • Experiments on high-dimensional datasets show improved accuracy and stability compared with existing score-based filters across multiple configurations.

Abstract

Data assimilation is the process of estimating the time-evolving state of a dynamical system by integrating model predictions and noisy observations. It is commonly formulated as Bayesian filtering, but classical filters often struggle with accuracy or computational feasibility in high dimensions. Recently, score-based generative models have emerged as a scalable approach for high-dimensional data assimilation, enabling accurate modeling and sampling of complex distributions. However, existing score-based filters often specify the forward process independently of the data assimilation. As a result, the measurement-update step depends on heuristic approximations of the likelihood score, which can accumulate errors and degrade performance over time. Here, we propose a measurement-aware score-based filter (MASF) that defines a measurement-aware forward process directly from the measurement equation. This construction makes the likelihood score analytically tractable: for linear measurements, we derive the exact likelihood score and combine it with a learned prior score to obtain the posterior score. Numerical experiments covering a range of settings, including high-dimensional datasets, demonstrate improved accuracy and stability over existing score-based filters.