Forecast collapse of transformer-based models under squared loss in financial time series

arXiv stat.ML / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper analyzes trajectory forecasting for financial time series under squared loss when conditional structure is weak, showing that the Bayes-optimal predictor becomes effectively degenerate (flat prices and zero returns in typical setups).
In this degenerate regime, increasing model expressivity—such as with highly expressive Transformer-based predictors—does not reduce bias but instead creates spurious trajectory fluctuations.
The authors attribute the performance degradation to a variance-driven mechanism caused by reuse of noise, which increases prediction variance without improving the mean prediction.
They support the theory with numerical experiments on high-frequency EUR/USD exchange-rate data, where Transformer models produce larger trajectory-level forecasting errors than a simple linear benchmark for most windows.

Abstract

We study trajectory forecasting under squared loss for time series with weak conditional structure, using highly expressive prediction models. Building on the classical characterization of squared-loss risk minimization, we emphasize regimes in which the conditional expectation of future trajectories is effectively degenerate, leading to trivial Bayes-optimal predictors (flat for prices and zero for returns in standard financial settings). In this regime, increased model expressivity does not improve predictive accuracy but instead introduces spurious trajectory fluctuations around the optimal predictor. These fluctuations arise from the reuse of noise and result in increased prediction variance without any reduction in bias. This provides a process-level explanation for the degradation of Transformerbased forecasts on financial time series. We complement these theoretical results with numerical experiments on high-frequency EUR/USD exchange rate data, analyzing the distribution of trajectory-level forecasting errors. The results show that Transformer-based models yield larger errors than a simple linear benchmark on a large majority of forecasting windows, consistent with the variance-driven mechanism identified by the theory.

Benchmarking Batch Deep Reinforcement Learning Algorithms

Dev.to

Qwen3.6-Plus: Alibaba's Quiet Giant in the AI Race Delivers a Million-Token Enterprise Powerhouse

Dev.to

How To Leverage AI for Back-Office Headcount Optimization

Dev.to

Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.

Reddit r/LocalLLaMA

SOTA Language Models Under 14B?

Reddit r/LocalLLaMA

Forecast collapse of transformer-based models under squared loss in financial time series

Key Points

Abstract

Related Articles

Benchmarking Batch Deep Reinforcement Learning Algorithms

Qwen3.6-Plus: Alibaba's Quiet Giant in the AI Race Delivers a Million-Token Enterprise Powerhouse

How To Leverage AI for Back-Office Headcount Optimization

Is 1-bit and TurboQuant the future of OSS? A simulation for Qwen3.5 models.

SOTA Language Models Under 14B?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer