PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching

arXiv cs.CL / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

PowerFlow introduces a principled, distribution-matching view of unsupervised fine-tuning for LLMs by using GFlowNet as an amortized variational sampler for unnormalized densities.
It adds a length-aware Trajectory-Balance objective to explicitly neutralize the structural length biases inherent in autoregressive generation.
By targeting alpha-power distributions, PowerFlow can sharpen the model (alpha>1) to enhance logical reasoning or flatten it (alpha<1) to unlock expressive creativity.
Experiments show PowerFlow outperforms existing RLIF methods, matches or surpasses supervised baselines, and improves diversity without sacrificing quality, shifting the Pareto frontier in creative tasks.

Abstract

Unsupervised Reinforcement Learning from Internal Feedback (RLIF) has emerged as a promising paradigm for eliciting the latent capabilities of Large Language Models (LLMs) without external supervision. However, current methods rely on heuristic intrinsic rewards, which often lack a well-defined theoretical optimization target and are prone to degenerative biases. In this work, we introduce PowerFlow, a principled framework that reformulates unsupervised fine-tuning as a distribution matching problem. By casting GFlowNet as an amortized variational sampler for unnormalized densities, we propose a length-aware Trajectory-Balance objective that explicitly neutralizes the structural length biases inherent in autoregressive generation. By targeting

\alpha

-power distributions, PowerFlow enables the directional elicitation of the dual nature of LLMs: sharpening the distribution (

\alpha > 1

) to intensify logical reasoning, or flattening it (

\alpha < 1

) to unlock expressive creativity. Extensive experiments demonstrate that PowerFlow consistently outperforms existing RLIF methods, matching or even exceeding supervised GRPO. Furthermore, by mitigating over-sharpening in aligned models, our approach achieves simultaneous gains in diversity and quality, shifting the Pareto frontier in creative tasks.

Check out this article on AI-Driven Reporting 2.0: From Manual Bottlenecks to Real-Time Decision Intelligence (2026 Edition)

Dev.to

SYNCAI

Dev.to

How AI-Powered Decision Making is Reshaping Enterprise Strategy in 2024

Dev.to

When AI Grows Up: Identity, Memory, and What Persists Across Versions

Dev.to

AI-Driven Reporting 2.0: From Manual Bottlenecks to Real-Time Decision Intelligence (2026 Edition)

Dev.to

PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching

Key Points

Abstract

Related Articles

Check out this article on AI-Driven Reporting 2.0: From Manual Bottlenecks to Real-Time Decision Intelligence (2026 Edition)

SYNCAI

How AI-Powered Decision Making is Reshaping Enterprise Strategy in 2024

When AI Grows Up: Identity, Memory, and What Persists Across Versions

AI-Driven Reporting 2.0: From Manual Bottlenecks to Real-Time Decision Intelligence (2026 Edition)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer