AI Navigate

Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning Control

arXiv cs.LG / 3/20/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes an adaptive stock price prediction framework that identifies deviations from normal market conditions and routes data through specialized prediction pathways.
  • It comprises three components: an autoencoder trained on normal conditions to detect anomalies via reconstruction error, dual node transformer networks for stable and event-driven regimes, and a Soft Actor-Critic controller that tunes regime thresholds and blending weights based on prediction feedback.
  • In experiments on 20 S&P 500 stocks from 1982–2025, the method achieves 0.68% MAPE without the RL controller and 0.59% MAPE with the full adaptive system, with directional accuracy around 72% and robust performance during high volatility where baseline MAPE exceeds 1.5%.
  • Ablation studies show each component meaningfully contributes to performance: autoencoder routing accounts for about 36% of relative MAPE degradation when removed, SAC controller about 15%, and the dual-path architecture about 7%.
  • The work suggests regime-aware, RL-guided forecasting can enhance stability and accuracy in financial markets and may inform deployment in trading systems that adapt to changing market regimes.

Abstract

Stock markets exhibit regime-dependent behavior where prediction models optimized for stable conditions often fail during volatile periods. Existing approaches typically treat all market states uniformly or require manual regime labeling, which is expensive and quickly becomes stale as market dynamics evolve. This paper introduces an adaptive prediction framework that adaptively identifies deviations from normal market conditions and routes data through specialized prediction pathways. The architecture consists of three components: (1) an autoencoder trained on normal market conditions that identifies anomalous regimes through reconstruction error, (2) dual node transformer networks specialized for stable and event-driven market conditions respectively, and (3) a Soft Actor-Critic reinforcement learning controller that adaptively tunes the regime detection threshold and pathway blending weights based on prediction performance feedback. The reinforcement learning component enables the system to learn adaptive regime boundaries, defining anomalies as market states where standard prediction approaches fail. Experiments on 20 S&P 500 stocks spanning 1982 to 2025 demonstrate that the proposed framework achieves 0.68% MAPE for one-day predictions without the reinforcement controller and 0.59% MAPE with the full adaptive system, compared to 0.80% for the baseline integrated node transformer. Directional accuracy reaches 72% with the complete framework. The system maintains robust performance during high-volatility periods, with MAPE below 0.85% when baseline models exceed 1.5%. Ablation studies confirm that each component contributes meaningfully: autoencoder routing accounts for 36% relative MAPE degradation upon removal, followed by the SAC controller at 15% and the dual-path architecture at 7%.