DARLING: Detection Augmented Reinforcement Learning with Non-Stationary Guarantees
arXiv cs.LG / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies model-free reinforcement learning in piecewise-stationary episodic finite-horizon MDPs where both rewards and transitions may change multiple times without the agent knowing when.
- It introduces DARLING, a modular “detection-augmented” wrapper that can be applied to both tabular and linear MDP settings without requiring prior information about change points.
- The authors provide improved dynamic regret guarantees for DARLING under specific change-point separation and reachability assumptions, and they validate the approach empirically.
- They also prove the first minimax lower bounds for piecewise-stationary RL in tabular and linear MDPs, supporting that DARLING is nearly optimal.
- Experiments on standard benchmarks show DARLING outperforming existing state-of-the-art methods across a range of non-stationary scenarios.
Related Articles

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA