Fisher Decorator: Refining Flow Policy via A Local Transport Map

arXiv cs.RO / 4/21/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper targets limitations of existing flow-based offline reinforcement learning policies, which can mis-handle the relationship between L2 regularization and the 2-Wasserstein distance in offline settings.
It reframes policy refinement geometrically by treating the update as applying a local transport map (an initial flow policy plus a residual displacement) to correct optimization direction.
By studying how the policy-induced density transforms, the authors derive a local quadratic approximation of a KL-constrained objective using the Fisher information matrix, yielding an anisotropic (direction-aware) optimization problem.
The method uses the score function embedded in the flow velocity to form a corresponding quadratic constraint, enabling efficient optimization.
Experiments on multiple offline RL benchmarks show state-of-the-art performance, and the theory explains that prior methods’ suboptimality comes from their isotropic approximations.

Abstract

Recent advances in flow-based offline reinforcement learning (RL) have achieved strong performance by parameterizing policies via flow matching. However, they still face critical trade-offs among expressiveness, optimality, and efficiency. In particular, existing flow policies interpret the

L_2

regularization as an upper bound of the 2-Wasserstein distance (

W_2

), which can be problematic in offline settings. This issue stems from a fundamental geometric mismatch: the behavioral policy manifold is inherently anisotropic, whereas the

L_2

(or upper bound of

W_2

) regularization is isotropic and density-insensitive, leading to systematically misaligned optimization directions. To address this, we revisit offline RL from a geometric perspective and show that policy refinement can be formulated as a local transport map: an initial flow policy augmented by a residual displacement. By analyzing the induced density transformation, we derive a local quadratic approximation of the KL-constrained objective governed by the Fisher information matrix, enabling a tractable anisotropic optimization formulation. By leveraging the score function embedded in the flow velocity, we obtain a corresponding quadratic constraint for efficient optimization. Our results reveal that the optimality gap in prior methods arises from their isotropic approximation. In contrast, our framework achieves a controllable approximation error within a provable neighborhood of the optimal solution. Extensive experiments demonstrate state-of-the-art performance across diverse offline RL benchmarks. See project page: https://github.com/ARC0127/Fisher-Decorator.

Every time a new model comes out, the old one is obsolete of course

Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims

Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM

Reddit r/LocalLLaMA

Fisher Decorator: Refining Flow Policy via A Local Transport Map

Key Points

Abstract

Related Articles

Every time a new model comes out, the old one is obsolete of course

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer