Wasserstein Formulation of Reinforcement Learning. An Optimal Transport Perspective on Policy Optimization

arXiv cs.LG / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a geometric reinforcement learning (RL) framework that treats policies as mappings into the Wasserstein space of action probability distributions.
  • It establishes a Riemannian structure based on stationary distributions, defines the policy tangent space, and studies geodesics with attention to measurability issues for the associated vector fields.
  • The authors formulate a general RL optimization objective and build a gradient flow using Otto’s calculus, enabling a rigorous second-order analysis by deriving the gradient and Hessian of an energy functional.
  • The approach is validated with numerical experiments in low-dimensional settings and extended to higher-dimensional cases by parameterizing the policy with a neural network and optimizing via an ergodic approximation of the cost.

Abstract

We present a geometric framework for Reinforcement Learning (RL) that views policies as maps into the Wasserstein space of action probabilities. First, we define a Riemannian structure induced by stationary distributions, proving its existence in a general context. We then define the tangent space of policies and characterize the geodesics, specifically addressing the measurability of vector fields mapped from the state space to the tangent space of probability measures over the action space. Next, we formulate a general RL optimization problem and construct a gradient flow using Otto's calculus. We compute the gradient and the Hessian of the energy, providing a formal second-order analysis. Finally, we illustrate the method with numerical examples for low-dimensional problems, computing the gradient directly from our theoretical formalism. For high-dimensional problems, we parameterize the policy using a neural network and optimize it based on an ergodic approximation of the cost.