LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories
arXiv cs.CV / 4/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes LeapAlign, a fine-tuning approach for flow matching models that aligns them with human preferences via reward-gradient backpropagation through the generation process.
- Direct backpropagation over long ODE trajectories is shown to be impractical due to high memory usage and gradient explosion, limiting updates to early generation steps.
- LeapAlign reduces the long trajectory to a two-step “leap” design, where each leap skips multiple ODE sampling steps and predicts future latents in one shot.
- By randomizing the leap start/end timesteps and reweighting training trajectories based on consistency with the long generation path (while dampening large-magnitude gradient terms), LeapAlign enables stable and efficient updates at any generation step.
- Experiments fine-tuning the Flux model demonstrate that LeapAlign outperforms existing GRPO-based and direct-gradient methods on image quality and image-text alignment metrics.


![[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup](/_next/image?url=https%3A%2F%2Fmedia2.dev.to%2Fdynamic%2Fimage%2Fwidth%3D1200%2Cheight%3D627%2Cfit%3Dcover%2Cgravity%3Dauto%2Cformat%3Dauto%2Fhttps%253A%252F%252Fdev-to-uploads.s3.amazonaws.com%252Fuploads%252Farticles%252Flu4b6ttuhur71z5gemm0.png&w=3840&q=75)
