From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space

arXiv cs.CV / 3/16/2026

💬 OpinionModels & Research

共有:

Key Points

MV-GRPO extends Group Relative Policy Optimization by augmenting the condition space with a Condition Enhancer to generate semantically adjacent yet diverse captions, enabling dense multi-view reward mapping for T2I flow models.
The approach targets the limitation of single-view evaluation, which underexplores inter-sample relationships and can cap alignment performance.
It computes the original samples' probability distribution conditioned on the new captions and incorporates these signals into training without requiring costly sample regeneration.
Experimental results show MV-GRPO achieves superior alignment performance compared with state-of-the-art methods.

Abstract

Group Relative Policy Optimization (GRPO) has emerged as a powerful framework for preference alignment in text-to-image (T2I) flow models. However, we observe that the standard paradigm where evaluating a group of generated samples against a single condition suffers from insufficient exploration of inter-sample relationships, constraining both alignment efficacy and performance ceilings. To address this sparse single-view evaluation scheme, we propose Multi-View GRPO (MV-GRPO), a novel approach that enhances relationship exploration by augmenting the condition space to create a dense multi-view reward mapping. Specifically, for a group of samples generated from one prompt, MV-GRPO leverages a flexible Condition Enhancer to generate semantically adjacent yet diverse captions. These captions enable multi-view advantage re-estimation, capturing diverse semantic attributes and providing richer optimization signals. By deriving the probability distribution of the original samples conditioned on these new captions, we can incorporate them into the training process without costly sample regeneration. Extensive experiments demonstrate that MV-GRPO achieves superior alignment performance over state-of-the-art methods.

Interesting loop

Reddit r/LocalLLaMA

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

Reddit r/LocalLLaMA

FeatherOps: Fast fp8 matmul on RDNA3 without native fp8

Reddit r/LocalLLaMA

VerityFlow-AI: Engineering a Multi-Agent Swarm for Real-Time Truth-Validation and Deep-Context Media Synthesis

Dev.to

: [R] Sinc Reconstruction for LLM Prompts: Applying Nyquist-Shannon to the Specification Axis (275 obs, 97% cost reduction, open source)

Reddit r/MachineLearning

From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space

Key Points

Abstract

Related Articles

Interesting loop

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

FeatherOps: Fast fp8 matmul on RDNA3 without native fp8

VerityFlow-AI: Engineering a Multi-Agent Swarm for Real-Time Truth-Validation and Deep-Context Media Synthesis

: [R] Sinc Reconstruction for LLM Prompts: Applying Nyquist-Shannon to the Specification Axis (275 obs, 97% cost reduction, open source)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer