Forecast collapse of transformer-based models under squared loss in financial time series

arXiv stat.ML / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes trajectory forecasting for financial time series under squared loss when conditional structure is weak, showing that the Bayes-optimal predictor becomes effectively degenerate (flat prices and zero returns in typical setups).
  • In this degenerate regime, increasing model expressivity—such as with highly expressive Transformer-based predictors—does not reduce bias but instead creates spurious trajectory fluctuations.
  • The authors attribute the performance degradation to a variance-driven mechanism caused by reuse of noise, which increases prediction variance without improving the mean prediction.
  • They support the theory with numerical experiments on high-frequency EUR/USD exchange-rate data, where Transformer models produce larger trajectory-level forecasting errors than a simple linear benchmark for most windows.

Abstract

We study trajectory forecasting under squared loss for time series with weak conditional structure, using highly expressive prediction models. Building on the classical characterization of squared-loss risk minimization, we emphasize regimes in which the conditional expectation of future trajectories is effectively degenerate, leading to trivial Bayes-optimal predictors (flat for prices and zero for returns in standard financial settings). In this regime, increased model expressivity does not improve predictive accuracy but instead introduces spurious trajectory fluctuations around the optimal predictor. These fluctuations arise from the reuse of noise and result in increased prediction variance without any reduction in bias. This provides a process-level explanation for the degradation of Transformerbased forecasts on financial time series. We complement these theoretical results with numerical experiments on high-frequency EUR/USD exchange rate data, analyzing the distribution of trajectory-level forecasting errors. The results show that Transformer-based models yield larger errors than a simple linear benchmark on a large majority of forecasting windows, consistent with the variance-driven mechanism identified by the theory.