AI Navigate

A Controlled Comparison of Deep Learning Architectures for Multi-Horizon Financial Forecasting: Evidence from 918 Experiments

arXiv cs.LG / 3/19/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The authors perform a controlled, multi-architecture comparison of nine deep learning models (Autoformer, DLinear, iTransformer, LSTM, ModernTCN, N-HiTS, PatchTST, TimesNet, TimeXer) across crypto, forex, and equity index markets at 4-hour and 24-hour horizons, based on 918 experiments.
  • They implement a strict five-stage protocol including fixed-seed Bayesian hyperparameter optimization, per-asset-class configuration freezing, multi-seed retraining, uncertainty aggregation, and statistical validation.
  • ModernTCN achieves the best mean rank (1.333) with a 75 percent first-place rate, followed by PatchTST.
  • The results suggest architecture explains nearly all performance variance, with seed randomness contributing negligibly and directional accuracy around 50 percent across configurations, indicating MSE-trained models lack directional skill at hourly resolution.
  • The study highlights the importance of architectural inductive bias over parameter count and provides reproducible guidance for multi-step financial forecasting.

Abstract

Multi-horizon price forecasting is central to portfolio allocation, risk management, and algorithmic trading, yet deep learning architectures have proliferated faster than rigorous financial benchmarks can evaluate them. This study provides a controlled comparison of nine architectures (Autoformer, DLinear, iTransformer, LSTM, ModernTCN, N-HiTS, PatchTST, TimesNet, and TimeXer) spanning Transformer, MLP, CNN, and RNN families across cryptocurrency, forex, and equity index markets at 4-hour and 24-hour horizons. A total of 918 experiments were conducted under a strict five-stage protocol including fixed-seed Bayesian hyperparameter optimization, configuration freezing per asset class, multi-seed retraining, uncertainty aggregation, and statistical validation. ModernTCN achieves the best mean rank (1.333) with a 75 percent first-place rate, followed by PatchTST (2.000). Results reveal a clear three-tier ranking structure and show that architecture explains nearly all performance variance, while seed randomness is negligible. Rankings remain stable across horizons despite 2 to 2.5 times error amplification. Directional accuracy remains near 50 percent across all configurations, indicating that MSE-trained models lack directional skill at hourly resolution. The findings highlight the importance of architectural inductive bias over raw parameter count and provide reproducible guidance for multi-step financial forecasting.