Adaptive Robust Estimator for Multi-Agent Reinforcement Learning
arXiv cs.AI / 2026/3/24
💬 オピニオンIdeas & Deep AnalysisModels & Research
要点
- The paper proposes a robust multi-agent reinforcement learning framework for collaborative reasoning that addresses both interaction-level ambiguity between agents and unstable learning under heavy-tailed/noisy rewards.
- It introduces a Dual-Agent Answer-Critique-Rewrite (DACR) pipeline that structures reasoning into answer, critique, and rewrite stages while enabling explicit credit assignment via marginal contribution across agents.
- It also introduces an Adaptive Robust Estimator (ARE) to compute more reliable batch experience means during multi-agent policy optimization, aiming to correct biased advantage estimation from noisy reward signals.
- Experiments on mathematical reasoning and embodied intelligence benchmarks show consistent improvements over baselines in both homogeneous and heterogeneous multi-agent settings, including scenarios with significant reward noise.
- The authors report that the approach improves training stability and helps prevent optimization failures that can arise from noisy or heavy-tailed reward distributions.

