Recency Biased Causal Attention for Time-series Forecasting

arXiv stat.ML / 4/23/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that recency bias is a valuable inductive prior for sequential/time-series modeling, as it prioritizes nearby observations while still permitting long-range dependencies.
  • It highlights that standard Transformer attention is all-to-all and therefore misses causal/local temporal structure inherent in time-series data.
  • The authors introduce “recency-biased causal attention” by reweighting attention scores using a smooth heavy-tailed decay to strengthen local temporal relationships.
  • Experiments show consistent improvements in sequential modeling and time-series forecasting, with performance that is competitive and often superior on common benchmarks.
  • The method is framed as bringing Transformers closer to RNN-like dynamics, echoing the read/ignore/write behavior of RNN operations.

Abstract

Recency bias is a useful inductive prior for sequential modeling: it emphasizes nearby observations and can still allow longer-range dependencies. Standard Transformer attention lacks this property, relying on all-to-all interactions that overlook the causal and often local structure of temporal data. We propose a simple mechanism to introduce recency bias by reweighting attention scores with a smooth heavy-tailed decay. This adjustment strengthens local temporal dependencies without sacrificing the flexibility to capture broader and data-specific correlations. We show that recency-biased attention consistently improves sequential modeling, aligning Transformer more closely with the read, ignore, and write operations of RNNs. Finally, we demonstrate that our approach achieves competitive and often superior performance on challenging time-series forecasting benchmarks.