R2-Write: Reflection and Revision for Open-Ended Writing with Deep Reasoning

arXiv cs.CL / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper finds that mainstream deep-reasoning LLM approaches deliver only limited improvements on open-ended writing tasks, unlike their stronger gains in verifiable domains such as math.
It attributes the gap to a lack of deep reflection-and-revision behavior during the writing process, which constrains progress on creative and research-style outputs.
The authors introduce R2-Write, an automated framework that generates high-quality reasoning trajectories by iteratively combining a writer and judge to explicitly incorporate reflection and revision patterns.
To avoid repetitive or low-value self-reflection, they add a process reward mechanism during reinforcement learning that supervises reflection quality, improving both performance and token efficiency.
Experiments across multiple creative writing and deep-research benchmarks show significant improvements, supporting the claim that explicit reflection/revision enables deeper reasoning for open-ended writing.

Abstract

While deep reasoning with long chain-of-thought has dramatically improved large language models in verifiable domains like mathematics, its effectiveness for open-ended tasks such as writing remains unexplored. In this paper, we conduct a systematic investigation revealing that existing mainstream reasoning models achieve limited gains on open-ended writing tasks. Our further analysis shows that these models lack deep reflection and revision patterns in open-ended writing, resulting in substantially smaller improvements compared to mathematical reasoning tasks. To address this limitation, we introduce R2-Write: an automated framework that synthesizes high-quality thinking trajectories enriched with explicit reflection and revision patterns through iterative writer-judge interaction. To prevent redundant reflections, we design a process reward mechanism that supervises reflection quality during reinforcement learning, improving both performance and token efficiency. Extensive experiments across multiple creative writing and deep-research benchmarks demonstrate significant improvements, validating that explicitly incorporating reflection and revision patterns unlocks deep reasoning capabilities for open-ended writing tasks.