ViPO: Visual Preference Optimization at Scale
arXiv cs.CV / 4/29/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The paper argues that scaling visual preference optimization is difficult because existing preference datasets often contain conflicting/biased signals, causing naive optimization to fail.
- It proposes Poly-DPO, an extension of DPO that adds a polynomial term to adapt model confidence based on dataset characteristics for more robust learning under noisy or imbalanced data.
- To address data bottlenecks, the authors release ViPO, a large-scale visual preference dataset with 1M 1024px image pairs across five categories and 300K 720p+ video pairs across three categories, with diverse prompts and balanced distributions.
- Experiments show that on the proposed high-quality ViPO dataset, the best Poly-DPO configuration converges to standard DPO, indicating Poly-DPO’s adaptivity and that improvements mainly matter when data is imperfect.
- On noisy datasets such as Pick-a-Pic V2, Poly-DPO delivers gains over Diffusion-DPO (6.87 on GenEval for SD1.5 and 2.32 for SDXL), and on ViPO it enables models to outperform those trained on prior open-source preference datasets.
Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team
Dev.to

An API testing tool built specifically for AI agent loops
Dev.to
IK_LLAMA now supports Qwen3.5 MTP Support :O
Reddit r/LocalLLaMA
OpenAI models, Codex, and Managed Agents come to AWS
Dev.to

Indian Developers: How to Build AI Side Income with $0 Capital in 2026
Dev.to