MARS$^2$: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation

arXiv cs.AI / 4/17/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper argues that reinforcement learning for code generation can hit a performance ceiling due to limited trajectory diversity during exploration.
It reviews how search-enhanced RL helps with exploration but is still constrained by single-agent policy priors, while multi-policy methods often remain disconnected from structured search.
MARS$^2$ proposes a unified framework where multiple independently optimized agents collaborate inside a shared tree-structured search environment.
It formulates learning using a path-level group advantage with tree-consistent reward shaping to improve credit assignment across complex search trajectories.
Experiments on code generation benchmarks show MARS$^2$ improves results across different model combinations and training settings, and the code is released publicly.

Abstract

Reinforcement learning (RL) paradigms have demonstrated strong performance on reasoning-intensive tasks such as code generation. However, limited trajectory diversity often leads to diminishing returns, which constrains the achievable performance ceiling. Search-enhanced RL alleviates this issue by introducing structured exploration, which remains constrained by the single-agent policy priors. Meanwhile, leveraging multiple interacting policies can acquire more diverse exploratory signals, but existing approaches are typically decoupled from structured search. We propose \textbf{MARS

^2

} (Multi-Agent Reinforced Tree-Search Scaling), a unified RL framework in which multiple independently-optimized agents collaborate within a shared tree-structured search environment. MARS

^2

models the search tree as a learnable multi-agent interaction environment, enabling heterogeneous agents to collaboratively generate and refine candidate solutions within a shared search topology. To support effective learning, we introduce a path-level group advantage formulation based on tree-consistent reward shaping, which facilitates effective credit assignment across complex search trajectories. Experiments on code generation benchmarks show that MARS

^2

consistently improves performance across diverse model combinations and training settings, demonstrating the effectiveness of coupling multi-agent collaboration with tree search for enhancing reinforcement learning. Our code is publicly available at https://github.com/TsinghuaC3I/MARTI.

Black Hat USA

AI Business

Black Hat Asia

AI Business

langchain-anthropic==1.4.1

LangChain Releases

🚀 Anti-Gravity Meets Cloud AI: The Future of Effortless Development

Dev.to

Stop burning tokens on DOM noise: a Playwright MCP optimizer layer

Dev.to

MARS$^2$: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

langchain-anthropic==1.4.1

🚀 Anti-Gravity Meets Cloud AI: The Future of Effortless Development

Stop burning tokens on DOM noise: a Playwright MCP optimizer layer

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer