Model-Based Reinforcement Learning for Control under Time-Varying Dynamics

arXiv cs.LG / 4/3/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies reinforcement learning for control when system dynamics are non-stationary and change across episodes, a common real-world challenge caused by drift, wear, and operating-condition shifts.
It frames the problem as continual, model-based RL and analyzes it using Gaussian process dynamics models under frequentist variation-budget assumptions.
The authors show that when non-stationarity persists, outdated data must be explicitly down-weighted or limited to keep uncertainty calibrated and preserve dynamic-regret guarantees.
Building on these theoretical insights, they introduce an optimistic model-based RL algorithm that uses adaptive data buffers to manage legacy data influence.
Experiments on continuous-control benchmarks with non-stationary dynamics indicate improved performance for the proposed approach.

Abstract

Learning-based control methods typically assume stationary system dynamics, an assumption often violated in real-world systems due to drift, wear, or changing operating conditions. We study reinforcement learning for control under time-varying dynamics. We consider a continual model-based reinforcement learning setting in which an agent repeatedly learns and controls a dynamical system whose transition dynamics evolve across episodes. We analyze the problem using Gaussian process dynamics models under frequentist variation-budget assumptions. Our analysis shows that persistent non-stationarity requires explicitly limiting the influence of outdated data to maintain calibrated uncertainty and meaningful dynamic regret guarantees. Motivated by these insights, we propose a practical optimistic model-based reinforcement learning algorithm with adaptive data buffer mechanisms and demonstrate improved performance on continuous control benchmarks with non-stationary dynamics.