Recovering Hidden Reward in Diffusion-Based Policies
arXiv cs.RO / 5/4/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The paper proposes EnergyFlow, a framework that links diffusion-based generative action modeling with inverse reinforcement learning via a learned scalar energy function whose gradient corresponds to the denoising field.
- It shows (under maximum-entropy optimality) that denoising score matching recovers the gradient of the expert’s soft Q-function, enabling reward extraction without adversarial IRL training.
- The authors prove that forcing the learned field to be conservative lowers hypothesis complexity and improves out-of-distribution generalization bounds, while also analyzing reward identifiability.
- They bound how score estimation errors affect recovered action preferences and report state-of-the-art imitation results on multiple manipulation tasks.
- EnergyFlow’s extracted reward is also reported to improve downstream reinforcement learning performance, outperforming both adversarial IRL and likelihood-based alternatives, with code released on GitHub.
Related Articles

Black Hat USA
AI Business
A very basic litmus test for LLMs "ok give me a python program that reads my c: and put names and folders in a sorted list from biggest to small"
Reddit r/LocalLLaMA

ALM on Power Platform: ADO + GitHub, the best of both worlds
Dev.to

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️
Dev.to

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?
Dev.to