ViPO: Visual Preference Optimization at Scale

arXiv cs.CV / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper argues that scaling visual preference optimization is difficult because existing preference datasets often contain conflicting/biased signals, causing naive optimization to fail.
  • It proposes Poly-DPO, an extension of DPO that adds a polynomial term to adapt model confidence based on dataset characteristics for more robust learning under noisy or imbalanced data.
  • To address data bottlenecks, the authors release ViPO, a large-scale visual preference dataset with 1M 1024px image pairs across five categories and 300K 720p+ video pairs across three categories, with diverse prompts and balanced distributions.
  • Experiments show that on the proposed high-quality ViPO dataset, the best Poly-DPO configuration converges to standard DPO, indicating Poly-DPO’s adaptivity and that improvements mainly matter when data is imperfect.
  • On noisy datasets such as Pick-a-Pic V2, Poly-DPO delivers gains over Diffusion-DPO (6.87 on GenEval for SD1.5 and 2.32 for SDXL), and on ViPO it enables models to outperform those trained on prior open-source preference datasets.

Abstract

While preference optimization is crucial for improving visual generative models, how to effectively scale this paradigm remains largely unexplored. Current open-source preference datasets contain conflicting preference patterns, where winners excel in some dimensions but underperform in others. Naively optimizing on such noisy datasets fails to learn preferences, hindering effective scaling. To enhance robustness against noise, we propose Poly-DPO, which extends the DPO objective with an additional polynomial term that dynamically adjusts model confidence based on dataset characteristics, enabling effective learning across diverse data distributions. Beyond biased patterns, existing datasets suffer from low resolution, limited prompt diversity, and imbalanced distributions. To facilitate large-scale visual preference optimization by tackling data bottlenecks, we construct ViPO, a massive-scale preference dataset with 1M image pairs at 1024px across five categories and 300K video pairs at 720p+ across three categories. State-of-the-art generative models and diverse prompts ensure reliable preference signals with balanced distributions. Remarkably, when applying Poly-DPO to our high-quality dataset, the optimal configuration converges to standard DPO. This convergence validates dataset quality and Poly-DPO's adaptive nature: sophisticated optimization becomes unnecessary with sufficient data quality, yet remains valuable for imperfect datasets. We validate our approach across visual generation models. On noisy datasets like Pick-a-Pic V2, Poly-DPO achieves 6.87 and 2.32 gains over Diffusion-DPO on GenEval for SD1.5 and SDXL, respectively. For ViPO, models achieve performance far exceeding those trained on existing open-source preference datasets. These results confirm that addressing both algorithmic adaptability and data quality is essential for scaling visual preference optimization.