Neural ODE and SDE Models for Adaptation and Planning in Model-Based Reinforcement Learning

arXiv cs.LG / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper applies neural ordinary and stochastic differential equation models (neural ODEs and neural SDEs) to represent stochastic transition dynamics in model-based reinforcement learning for both fully and partially observed settings.
  • Experiments indicate that neural SDEs better capture stochasticity in transition dynamics, producing high-performing policies with improved sample efficiency, especially in difficult scenarios.
  • The authors use neural ODE/SDE inverse modeling to adapt policies to changes in environment dynamics with only limited additional interactions in the new environment.
  • For partial observability, they propose a latent SDE model that combines an ODE with a GAN-trained stochastic component in latent space, yielding a strong baseline on stochastic continuous-control benchmarks.
  • The work demonstrates action-conditional latent SDEs as an effective approach for RL planning under stochastic transitions and releases accompanying code on GitHub.

Abstract

We investigate neural ordinary and stochastic differential equations (neural ODEs and SDEs) to model stochastic dynamics in fully and partially observed environments within a model-based reinforcement learning (RL) framework. Through a sequence of simulations, we show that neural SDEs more effectively capture the inherent stochasticity of transition dynamics, enabling high-performing policies with improved sample efficiency in challenging scenarios. We leverage neural ODEs and SDEs for efficient policy adaptation to changes in environment dynamics via inverse models, requiring only limited interactions with the new environment. To address partial observability, we introduce a latent SDE model that combines an ODE with a GAN-trained stochastic component in latent space. Policies derived from this model provide a strong baseline, outperforming or matching general model-based and model-free approaches across stochastic continuous-control benchmarks. This work demonstrates the applicability of action-conditional latent SDEs for RL planning in environments with stochastic transitions. Our code is available at: https://github.com/ChaoHan-UoS/NeuralRL