DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining
arXiv cs.CL / 3/13/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- DatedGPT introduces twelve 1.3B-parameter language models trained from scratch on temporally partitioned data with strict annual cutoffs from 2013 to 2024 to prevent lookahead bias in financial backtesting.
- The models receive instruction fine-tuning on both general-domain and finance-specific datasets aligned to the same temporal cutoffs to constrain knowledge growth by time.
- Perplexity-based probing confirms that each model's knowledge is effectively bounded by its cutoff year, reducing leakage of future information.
- Evaluation on standard benchmarks shows competitive performance with existing models of similar scale despite the time-aware training.
- An interactive web demo allows users to query and compare responses from models across different cutoff years, illustrating practical time-aware forecasting workflows.
Related Articles
ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH
Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI
TechCrunch
AI Can Write Your Code. Who's Testing Your Thinking?
Dev.to

‘Uncanny Valley’: Nvidia’s ‘Super Bowl of AI,’ Tesla Disappoints, and Meta’s VR Metaverse ‘Shutdown’
Wired
[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln classes in agent frameworks)
Reddit r/MachineLearning