RTMC: Step-Level Credit Assignment via Rollout Trees

arXiv cs.LG / 4/14/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

RTMC (Rollout-Tree Monte Carlo) targets multi-step agentic reinforcement learning by improving fine-grained credit assignment beyond critic-free methods that assign the same advantage to all actions in a trajectory.
The approach leverages the observation that multiple rollouts for the same problem often share overlapping intermediate states, forming a rollout tree that enables grouping rollouts by common states.
RTMC estimates per-step Q-values and advantages by aggregating return statistics across rollouts sharing a matched state, while avoiding a learned critic to reduce overhead and fragility under sparse rewards.
A state-action signature system is introduced to compress interaction histories into compact representations, making cross-rollout state matching feasible.
On SWE-bench Verified, RTMC improves pass@1 by 3.2 percentage points over GRPO, indicating stronger step-level learning for code-generation agents.

Abstract

Multi-step agentic reinforcement learning benefits from fine-grained credit assignment, yet existing approaches offer limited options: critic-free methods like GRPO assign a uniform advantage to every action in a trajectory, while learned value networks introduce notable overhead and can be fragile under sparse rewards. We observe that group rollouts targeting the same problem often traverse overlapping intermediate states, implicitly forming a tree whose branches diverge at successive decision points. Building on this insight, we introduce Rollout-Tree Monte Carlo (RTMC) advantage estimation, which aggregates return statistics across rollouts sharing a common state to produce per-step Q-values and advantages--without any learned critic. A state-action signature system compresses raw interaction histories into compact, comparable representations, making cross-rollout state matching tractable. On SWE-bench Verified, RTMC improves pass@1 by 3.2 percentage points over GRPO.

Black Hat Asia

AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Don't forget, there is more than forgetting: new metrics for Continual Learning

Dev.to

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Dev.to

Bit of a strange question?

Reddit r/artificial

RTMC: Step-Level Credit Assignment via Rollout Trees

Key Points

Abstract

Related Articles

Black Hat Asia

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Don't forget, there is more than forgetting: new metrics for Continual Learning

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Bit of a strange question?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer