Efficient Reinforcement Learning using Linear Koopman Dynamics for Nonlinear Robotic Systems

arXiv cs.RO / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces a model-based reinforcement learning framework for nonlinear robotic systems that uses Koopman operator theory to learn linear “lifted” dynamics for closed-loop control.
  • It builds an actor-critic policy optimization scheme where the policy directly parameterizes a closed-loop controller based on the learned linear model.
  • To cut computational cost and reduce error from long-horizon rollouts, it estimates policy gradients using one-step predictions instead of multi-step propagation.
  • The method supports online mini-batch policy gradient updates from streamed interaction data, enabling continual improvement during training.
  • Experiments on nonlinear control benchmarks and real robots (Kinova Gen3 arm and Unitree Go1 quadruped) show better sample efficiency than model-free RL, stronger performance than model-based baselines, and results comparable to classical controllers when exact dynamics are available.

Abstract

This paper presents a model-based reinforcement learning (RL) framework for optimal closed-loop control of nonlinear robotic systems. The proposed approach learns linear lifted dynamics through Koopman operator theory and integrates the resulting model into an actor-critic architecture for policy optimization, where the policy represents a parameterized closed-loop controller. To reduce computational cost and mitigate model rollout errors, policy gradients are estimated using one-step predictions of the learned dynamics rather than multi-step propagation. This leads to an online mini-batch policy gradient framework that enables policy improvement from streamed interaction data. The proposed framework is evaluated on several simulated nonlinear control benchmarks and two real-world hardware platforms, including a Kinova Gen3 robotic arm and a Unitree Go1 quadruped. Experimental results demonstrate improved sample efficiency over model-free RL baselines, superior control performance relative to model-based RL baselines, and control performance comparable to classical model-based methods that rely on exact system dynamics.