Efficient Reinforcement Learning using Linear Koopman Dynamics for Nonlinear Robotic Systems

arXiv cs.RO / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces a model-based reinforcement learning framework for nonlinear robotic systems that uses Koopman operator theory to learn linear “lifted” dynamics for closed-loop control.
It builds an actor-critic policy optimization scheme where the policy directly parameterizes a closed-loop controller based on the learned linear model.
To cut computational cost and reduce error from long-horizon rollouts, it estimates policy gradients using one-step predictions instead of multi-step propagation.
The method supports online mini-batch policy gradient updates from streamed interaction data, enabling continual improvement during training.
Experiments on nonlinear control benchmarks and real robots (Kinova Gen3 arm and Unitree Go1 quadruped) show better sample efficiency than model-free RL, stronger performance than model-based baselines, and results comparable to classical controllers when exact dynamics are available.

Abstract

This paper presents a model-based reinforcement learning (RL) framework for optimal closed-loop control of nonlinear robotic systems. The proposed approach learns linear lifted dynamics through Koopman operator theory and integrates the resulting model into an actor-critic architecture for policy optimization, where the policy represents a parameterized closed-loop controller. To reduce computational cost and mitigate model rollout errors, policy gradients are estimated using one-step predictions of the learned dynamics rather than multi-step propagation. This leads to an online mini-batch policy gradient framework that enables policy improvement from streamed interaction data. The proposed framework is evaluated on several simulated nonlinear control benchmarks and two real-world hardware platforms, including a Kinova Gen3 robotic arm and a Unitree Go1 quadruped. Experimental results demonstrate improved sample efficiency over model-free RL baselines, superior control performance relative to model-based RL baselines, and control performance comparable to classical model-based methods that rely on exact system dynamics.