AI Navigate

A Spectral Revisit of the Distributional Bellman Operator under the Cram\'er Metric

arXiv cs.LG / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper analyzes distributional reinforcement learning at the level of cumulative distribution functions (CDFs) under the Cramér metric, treating this metric as the intrinsic geometry of the problem.
  • It shows that the Bellman update acts affinely on CDFs and linearly on differences between CDFs, providing a more structured view of its dynamics beyond standard contraction analyses.
  • The authors construct regularised spectral Hilbert representations that realise the CDF-level geometry by exact conjugation, with the regularisation vanishing in the zero-regularisation limit to recover the native Cramér metric.
  • This framework clarifies the operator structure of distributional Bellman updates and establishes a foundation for further functional-analytic studies in distributional reinforcement learning.

Abstract

Distributional reinforcement learning (DRL) studies the evolution of full return distributions under Bellman updates rather than focusing on expected values. A classical result is that the distributional Bellman operator is contractive under the Cram\'er metric, which corresponds to an L^2 geometry on differences of cumulative distribution functions (CDFs). While this contraction ensures stability of policy evaluation, existing analyses remain largely metric, focusing on contraction properties without elucidating the structural action of the Bellman update on distributions. In this work, we analyse distributional Bellman dynamics directly at the level of CDFs, treating the Cram\'er geometry as the intrinsic analytical setting. At this level, the Bellman update acts affinely on CDFs and linearly on differences between CDFs, and its contraction property yields a uniform bound on this linear action. Building on this intrinsic formulation, we construct a family of regularised spectral Hilbert representations that realise the CDF-level geometry by exact conjugation, without modifying the underlying Bellman dynamics. The regularisation affects only the geometry and vanishes in the zero-regularisation limit, recovering the native Cram\'er metric. This framework clarifies the operator structure underlying distributional Bellman updates and provides a foundation for further functional and operator-theoretic analyses in DRL.