Flow Matching Policy with Entropy Regularization
arXiv cs.LG / 3/19/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- FMER introduces an Ordinary Differential Equation-based online RL framework that parameterizes the policy via flow matching and samples actions along a straight probability path.
- It derives a tractable entropy objective to enable principled maximum-entropy optimization for improved exploration.
- The method leverages an advantage-weighted target velocity field derived from a candidate set to steer policy updates toward high-value regions, exploiting the model's generative nature.
- Empirical results on sparse multi-goal FrankaKitchen benchmarks show FMER outperforms state-of-the-art methods and remains competitive on MuJoCo, while reducing training time (about 7x faster than heavy diffusion baselines like QVPO and 10-15% faster than efficient variants).
- The findings suggest meaningful gains in sample efficiency and computation for diffusion-based RL, with potential impact on robotics and other AI-controlled systems.




