Recency Biased Causal Attention for Time-series Forecasting
arXiv stat.ML / 4/23/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that recency bias is a valuable inductive prior for sequential/time-series modeling, as it prioritizes nearby observations while still permitting long-range dependencies.
- It highlights that standard Transformer attention is all-to-all and therefore misses causal/local temporal structure inherent in time-series data.
- The authors introduce “recency-biased causal attention” by reweighting attention scores using a smooth heavy-tailed decay to strengthen local temporal relationships.
- Experiments show consistent improvements in sequential modeling and time-series forecasting, with performance that is competitive and often superior on common benchmarks.
- The method is framed as bringing Transformers closer to RNN-like dynamics, echoing the read/ignore/write behavior of RNN operations.
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

Why use an AI gateway at all?
Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago
Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity
Dev.to