Graph-GRPO: Training Graph Flow Models with Reinforcement Learning

arXiv cs.LG / 3/12/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Graph-GRPO introduces an online reinforcement learning framework to train Graph Flow Models using verifiable rewards, addressing alignment with task-specific objectives and human preferences.
It derives an analytical expression for the transition probability of GFMs, replacing Monte Carlo sampling and enabling fully differentiable rollouts for RL training.
A refinement strategy that perturbs specific nodes and edges to regenerate them enables localized exploration and self-improvement of generation quality.
Experiments show strong results, achieving 95.0% Valid-Unique-Novelty on planar graphs and 97.5% on tree graphs with 50 denoising steps, and attaining state-of-the-art performance on molecular optimization tasks surpassing graph-based, fragment-based RL methods and classic genetic algorithms.

Abstract

Graph generation is a fundamental task with broad applications, such as drug discovery. Recently, discrete flow matching-based graph generation, \aka, graph flow model (GFM), has emerged due to its superior performance and flexible sampling. However, effectively aligning GFMs with complex human preferences or task-specific objectives remains a significant challenge. In this paper, we propose Graph-GRPO, an online reinforcement learning (RL) framework for training GFMs under verifiable rewards. Our method makes two key contributions: (1) We derive an analytical expression for the transition probability of GFMs, replacing the Monte Carlo sampling and enabling fully differentiable rollouts for RL training; (2) We propose a refinement strategy that randomly perturbs specific nodes and edges in a graph, and regenerates them, allowing for localized exploration and self-improvement of generation quality. Extensive experiments on both synthetic and real datasets demonstrate the effectiveness of Graph-GRPO. With only 50 denoising steps, our method achieves 95.0\% and 97.5\% Valid-Unique-Novelty scores on the planar and tree datasets, respectively. Moreover, Graph-GRPO achieves state-of-the-art performance on the molecular optimization tasks, outperforming graph-based and fragment-based RL methods as well as classic genetic algorithms.

The massive shift toward edge computing and local processing

Dev.to

Self-Refining Agents in Spec-Driven Development

Dev.to

Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs

Dev.to

The Three-Agent Protocol Is Transferable. The Discipline Isn't.

Dev.to

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop

Reddit r/LocalLLaMA

Graph-GRPO: Training Graph Flow Models with Reinforcement Learning

Key Points

Abstract

Related Articles

The massive shift toward edge computing and local processing

Self-Refining Agents in Spec-Driven Development

Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs

The Three-Agent Protocol Is Transferable. The Discipline Isn't.

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer