SteerFlow: Steering Rectified Flows for Faithful Inversion-Based Image Editing

arXiv cs.CV / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes SteerFlow, a model-agnostic framework for faithful inversion-based, text-guided image editing that improves source fidelity over existing approaches.
  • SteerFlow’s forward stage uses an Amortized Fixed-Point Solver to straighten the generative trajectory by enforcing velocity consistency across timesteps, producing a higher-fidelity inverted latent.
  • Its backward stage introduces Trajectory Interpolation, adaptively blending editing and source-reconstruction velocities to keep edits anchored to the original image and reduce drift.
  • To better preserve backgrounds, SteerFlow adds Adaptive Masking that spatially constrains the editing signal using concept-guided segmentation and velocity differences between source and target.
  • Experiments on FLUX.1-dev and Stable Diffusion 3.5 Medium report consistently better editing quality than prior methods and show support for multi-turn editing without accumulating drift.

Abstract

Recent advances in flow-based generative models have enabled training-free, text-guided image editing by inverting an image into its latent noise and regenerating it under a new target conditional guidance. However, existing methods struggle to preserve source fidelity: higher-order solvers incur additional model inferences, truncated inversion constrains editability, and feature injection methods lack architectural transferability. To address these limitations, we propose SteerFlow, a model-agnostic editing framework with strong theoretical guarantees on source fidelity. In the forward process, we introduce an Amortized Fixed-Point Solver that implicitly straightens the forward trajectory by enforcing velocity consistency across consecutive timesteps, yielding a high-fidelity inverted latent. In the backward process, we introduce Trajectory Interpolation, which adaptively blends target-editing and source-reconstruction velocities to keep the editing trajectory anchored to the source. To further improve background preservation, we introduce an Adaptive Masking mechanism that spatially constrains the editing signal with concept-guided segmentation and source-target velocity differences. Extensive experiments on FLUX.1-dev and Stable Diffusion 3.5 Medium demonstrate that SteerFlow consistently achieves better editing quality than existing methods. Finally, we show that SteerFlow extends naturally to a complex multi-turn editing paradigm without accumulating drift.