LangMARL: Natural Language Multi-Agent Reinforcement Learning

arXiv cs.CL / 4/3/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that LLM-based multi-agent systems struggle to develop effective coordination because global outcome signals are too coarse to provide the causal feedback needed for local policy updates.
It frames this as a multi-agent credit assignment problem and claims this bottleneck is still insufficiently handled in LLM-based approaches compared with classical cooperative MARL.
LangMARL is proposed as a framework that adapts credit assignment and policy-gradient evolution techniques from cooperative MARL into the language space of LLM agents.
The method uses agent-level language credit assignment and summarizes task-relevant causal relations from replayed trajectories to generate denser feedback, aiming to improve convergence and performance under sparse rewards.
Experiments across multiple cooperative multi-agent tasks reportedly show gains in sample efficiency, interpretability of learned strategies, and generalization.

Abstract

Large language model (LLM) agents struggle to autonomously evolve coordination strategies in dynamic environments, largely because coarse global outcomes obscure the causal signals needed for local policy refinement. We identify this bottleneck as a multi-agent credit assignment problem, which has long been studied in classical multi-agent reinforcement learning (MARL) but remains underaddressed in LLM-based systems. Building on this observation, we propose LangMARL, a framework that brings credit assignment and policy gradient evolution from cooperative MARL into the language space. LangMARL introduces agent-level language credit assignment, pioneers gradient evolution in language space for policy improvement, and summarizes task-relevant causal relations from replayed trajectories to provide dense feedback and improve convergence under sparse rewards. Extensive experiments across diverse cooperative multi-agent tasks demonstrate improved sample efficiency, interpretability, and strong generalization.