How to Guide Your Flow: Few-Step Alignment via Flow Map Reward Guidance

arXiv cs.LG / 5/1/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The paper tackles “guidance” in generative modeling by framing reward maximization (e.g., aesthetic quality or human preference alignment) as a deterministic optimal control problem.
  • It introduces a hierarchy of algorithms that generalizes existing guidance approaches, with the flow map emerging naturally as part of the optimal solution.
  • Based on this, the authors propose Flow Map Reward Guidance (FMRG), a training-free method that uses a single trajectory and the flow map to both integrate and guide the generative flow.
  • Experiments at the text-to-image scale show FMRG can match or outperform baselines for multiple inverse problems, style transfer, human preferences, and VLM rewards using as few as 3 NFEs, achieving about an order-of-magnitude speedup over prior state of the art.
  • Overall, the work offers a principled, efficient alternative to expensive multi-step or poorly understood approximation-based guidance methods.

Abstract

In generative modeling, we often wish to produce samples that maximize a user-specified reward such as aesthetic quality or alignment with human preferences, a problem known as guidance. Despite their widespread use, existing guidance methods either require expensive multi-particle, many-step schemes or rely on poorly understood approximations. We reformulate guidance as a deterministic optimal control problem, yielding a hierarchy of algorithms that subsumes existing approaches at the coarsest level. We show that the flow map, an object of significant recent interest for its role in fast inference, arises naturally in the optimal solution. Based on this observation, we propose Flow Map Reward Guidance (FMRG): a training-free, single-trajectory framework that uses the flow map to both integrate and guide the flow. At text-to-image scale, FMRG matches or surpasses baselines across inverse problems, style transfer, human preferences, and VLM rewards with as few as 3 NFEs, giving at least an order-of-magnitude speedup in comparison to prior state of the art.