CRAFT: Grounded Multi-Agent Coordination Under Partial Information

arXiv cs.CL / 3/27/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

本稿は、部分情報下での実践的な会話・協調を評価するためのマルチエージェント・ベンチマーク「CRAFT」を新たに提案しています。
各エージェントは個別に観測しきれない状況で、自然言語による“実践的推論”を通じて共有3D構造を構築することが課題として定式化されています。
失敗要因を「空間グラウンディング」「信念（belief）モデリング」「実践的コミュニケーション」へ分解する診断フレームワークと、行動失敗プロファイルの分類（タクソノミー）を提示しています。
多様な推論モデル（オープンウェイト8、フロンティア7）を評価した結果、推論力の強さが協調性能に必ずしも結びつかず、個々の通信改善が協働成功を保証しないことが示されています。
現行の言語モデルにおいてマルチエージェント協調は根本的に未解決の難題であるという結論と、コード公開（GitHub）が案内されています。

Abstract

We introduce CRAFT, a multi-agent benchmark for evaluating pragmatic communication in large language models under strict partial information. In this setting, multiple agents with complementary but incomplete views must coordinate through natural language to construct a shared 3D structure that no single agent can fully observe. We formalize this problem as a multi-sender pragmatic reasoning task and provide a diagnostic framework that decomposes failures into spatial grounding, belief modeling and pragmatic communication errors, including a taxonomy of behavioral failure profiles in both frontier and open-weight models. Across a diverse set of models, including 8 open-weight and 7 frontier including reasoning models, we find that stronger reasoning ability does not reliably translate to better coordination: smaller open-weight models often match or outperform frontier systems, and improved individual communication does not guarantee successful collaboration. These results suggest that multi-agent coordination remains a fundamentally unsolved challenge for current language models. Our code can be found at https://github.com/csu-signal/CRAFT