ARROW: Augmented Replay for RObust World models
arXiv cs.LG / 3/13/2026
📰 NewsModels & Research
Key Points
- ARROW is a model-based continual reinforcement learning algorithm that extends DreamerV3 with a memory-efficient, distribution-matching replay buffer to mitigate catastrophic forgetting.
- It uses two complementary buffers: a short-term buffer for recent experiences and a long-term buffer that preserves task diversity through intelligent sampling.
- Evaluation on Atari (tasks without shared structure) and Procgen CoinRun variants (tasks with shared structure) shows ARROW reduces forgetting compared to baselines with the same replay buffer size, while maintaining forward transfer.
- The approach draws inspiration from neuroscience, where the brain replays experiences to a predictive world model rather than directly to the policy.
- The results highlight the potential of model-based RL with bio-inspired replay for continual learning and warrant further research.
Related Articles

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI
TechCrunch
[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln classes in agent frameworks)
Reddit r/MachineLearning
My Experience with Qwen 3.5 35B
Reddit r/LocalLLaMA

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4
VentureBeat
Qwen 3.5 122B completely falls apart at ~ 100K context
Reddit r/LocalLLaMA