HP-Edit: A Human-Preference Post-Training Framework for Image Editing

arXiv cs.CV / 4/22/2026

📰 NewsModels & Research

Key Points

  • The paper introduces HP-Edit, a post-training framework aimed at aligning diffusion-based image editing outputs with human preferences.
  • It addresses RLHF for image editing by proposing an automated human-preference scorer (HP-Scorer) built from a small amount of human preference scoring data and a pretrained visual language model (VLM).
  • HP-Scorer is used both to generate a scalable preference dataset and to provide a reward signal for post-training the image editing model.
  • The work also releases RealPref-50K, a real-world dataset covering eight common editing tasks (with balanced common object editing) and RealPref-Bench, a benchmark for evaluating real-world editing quality.
  • Experiments show that HP-Edit substantially improves alignment with human preferences for models such as Qwen-Image-Edit-2509.

Abstract

Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from Human Feedback (RLHF) to diffusion-based editing remains largely unexplored, due to a lack of scalable human-preference datasets and frameworks tailored to diverse editing needs. To fill this gap, we propose HP-Edit, a post-training framework for Human Preference-aligned Editing, and introduce RealPref-50K, a real-world dataset across eight common tasks and balancing common object editing. Specifically, HP-Edit leverages a small amount of human-preference scoring data and a pretrained visual large language model (VLM) to develop HP-Scorer--an automatic, human preference-aligned evaluator. We then use HP-Scorer both to efficiently build a scalable preference dataset and to serve as the reward function for post-training the editing model. We also introduce RealPref-Bench, a benchmark for evaluating real-world editing performance. Extensive experiments demonstrate that our approach significantly enhances models such as Qwen-Image-Edit-2509, aligning their outputs more closely with human preference.