High Volatility and Action Bias Distinguish LLMs from Humans in Group Coordination

arXiv cs.AI / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper compares LLM and human coordination in a no-communication common-interest game called Group Binary Search, where players iteratively adjust numeric submissions based on imperfect group feedback.
  • Results show that humans typically adapt and stabilize their behavior over repeated games, while LLMs often fail to improve and display excessive action switching that hinders convergence.
  • The study finds that providing more informative feedback (such as the magnitude of numerical error) strongly helps human participants but has only minor effects on LLM performance.
  • Using mechanism-level diagnostics like reactivity scaling and switching dynamics across games, the authors highlight behavioral differences between human and LLM groups and propose a grounded way to diagnose the “coordination gap.”

Abstract

Humans exhibit remarkable abilities to coordinate in groups. As large language models (LLMs) become more capable, it remains an open question whether they can demonstrate comparable adaptive coordination and whether they use the same strategies as humans. To investigate this, we compare LLM and human performance on a common-interest game with imperfect monitoring: Group Binary Search. In this n-player game, participants need to coordinate their actions to achieve a common objective. Players independently submit numerical values in an effort to collectively sum to a randomly assigned target number. Without direct communication, they rely on group feedback to iteratively adjust their submissions until they reach the target number. Our findings show that, unlike humans who adapt and stabilize their behavior over time, LLMs often fail to improve across games and exhibit excessive switching, which impairs group convergence. Moreover, richer feedback (e.g., numerical error magnitude) benefits humans substantially but has small effects on LLMs. Taken together, by grounding the analysis in human baselines and mechanism-level metrics, including reactivity scaling, switching dynamics, and learning across games, we point to differences in human and LLM groups and provide a behaviorally grounded diagnostic for closing the coordination gap.