Noise Titration: Exact Distributional Benchmarking for Probabilistic Time Series Forecasting

arXiv stat.ML / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that conventional time-series forecasting evaluation is largely unfalsifiable because it relies on passive observation of single historical trajectories rather than controllable perturbations.
It proposes “Noise Titration,” an interventionist benchmarking method that injects calibrated Gaussian observation noise into systems with known dynamics to enable exact distributional evaluation via negative log-likelihood and calibrated distributional tests.
The work extends the Fern architecture into a probabilistic generative model that directly parameterizes covariance structures on the Symmetric Positive Definite (SPD) cone to improve calibrated joint forecasting without expensive Jacobian modeling.
Experiments suggest that state-of-the-art zero-shot foundation models fail under non-stationary regime shifts and higher noise due to context-parroting, while Fern better preserves the invariant measure and multivariate geometry for sharper calibration.

Abstract

Modern time series forecasting is evaluated almost entirely through passive observation of single historical trajectories, rendering claims about a model's robustness to non-stationarity fundamentally unfalsifiable. We propose a paradigm shift toward interventionist, exact-statistical benchmarking. By systematically titrating calibrated Gaussian observation noise into known chaotic and stochastic dynamical systems, we transform forecasting from a black-box sequence matching game into an exact distributional inference task. Because the underlying data-generating process and noise variance are mathematically explicit, evaluation can rely on exact negative log-likelihoods and calibrated distributional tests rather than heuristic approximations. To fully leverage this framework, we extend the Fern architecture into a probabilistic generative model that natively parameterizes the Symmetric Positive Definite (SPD) cone, outputting calibrated joint covariance structures without the computational bottleneck of generic Jacobian modeling. Under this rigorous evaluation, we find that state-of-the-art zero-shot foundation models behave consistently with the context-parroting mechanism, failing systematically under non-stationary regime shifts and elevated noise. In contrast, Fern explicitly captures the invariant measure and multivariate geometry of the underlying dynamics, maintaining structural fidelity and statistically sharp calibration precisely where massive sequence-matching models collapse.

Interactive Web Visualization of GPT-2

Reddit r/artificial

Stop Treating AI Interview Fraud Like a Proctoring Problem

Dev.to

[R] Causal self-attention as a probabilistic model over embeddings

Reddit r/MachineLearning

The 5 software development trends that actually matter in 2026 (and what they mean for your startup)

Dev.to

InVideo AI Review: Fast Finished

Dev.to

Noise Titration: Exact Distributional Benchmarking for Probabilistic Time Series Forecasting

Key Points

Abstract

Related Articles

Interactive Web Visualization of GPT-2

Stop Treating AI Interview Fraud Like a Proctoring Problem

[R] Causal self-attention as a probabilistic model over embeddings

The 5 software development trends that actually matter in 2026 (and what they mean for your startup)

InVideo AI Review: Fast Finished

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer