Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows

arXiv cs.CL / 4/23/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study benchmarks 35 open-weight LLMs by having them participate in six behavioral-economics games that test different cooperation mechanisms under shared constraints.
It finds that “cooperative profiles” derived from these game behaviors strongly and robustly predict how well LLM agents perform together on AI-for-Science workflows.
In particular, models that learn to coordinate well in the games and that favor multiplicative (team-amplifying) strategies outperform greedy approaches in producing scientific reports.
The predictive relationship remains after controlling for multiple factors, suggesting that cooperative disposition is a measurable, distinct property of LLMs rather than just general capability.
The authors propose a behavioral-games framework as a fast, low-cost diagnostic tool to screen for cooperative “fitness” before deploying expensive multi-agent systems.

Abstract

Multi-agent systems built from teams of large language models (LLMs) are increasingly deployed for collaborative scientific reasoning and problem-solving. These systems require agents to coordinate under shared constraints, such as GPUs or credit balances, where cooperative behavior matters. Behavioral economics provides a rich toolkit of games that isolate distinct cooperation mechanisms, yet it remains unknown whether a model's behavior in these stylized settings predicts its performance in realistic collaborative tasks. Here, we benchmark 35 open-weight LLMs across six behavioral economics games and show that game-derived cooperative profiles robustly predict downstream performance in AI-for-Science tasks, where teams of LLM agents collaboratively analyze data, build models, and produce scientific reports under shared budget constraints. Models that effectively coordinate games and invest in multiplicative team production (rather than greedy strategies) produce better scientific reports across three outcomes, accuracy, quality, and completion. These associations hold after controlling for multiple factors, indicating that cooperative disposition is a distinct, measurable property of LLMs not reducible to general ability. Our behavioral games framework thus offers a fast and inexpensive diagnostic for screening cooperative fitness before costly multi-agent deployment.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

Dev.to

10 AI Tools Every Developer Should Try in 2026

Dev.to

Why use an AI gateway at all?

Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

Dev.to

Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows

Key Points

Abstract

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans

10 AI Tools Every Developer Should Try in 2026

Why use an AI gateway at all?

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer