Before Humans Join the Team: Diagnosing Coordination Failures in Healthcare Robot Team Simulation

arXiv cs.RO / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces an agent-simulation method that instantiates every robot-team role, including a supervisory manager, as LLM agents to study coordination failures before humans participate in real healthcare settings.
  • Two experiments using a controllable healthcare scenario compare hierarchical team configurations to analyze coordination behaviors and distinct failure patterns.
  • The results suggest that team structure is the main bottleneck for coordination, rather than contextual knowledge availability or raw model capability.
  • The study identifies a trade-off between reasoning autonomy (how freely agents decide) and overall system stability in coordinated multi-agent operation.
  • The authors provide supplementary artifacts (code, agent setup, traces, and annotated failure examples) to support resilient team design, process-level evaluation, clearer coordination protocols, and safer human integration.

Abstract

As humans move toward collaborating with coordinated robot teams, understanding how these teams coordinate and fail is essential for building trust and ensuring safety. However, exposing human collaborators to coordination failures during early-stage development is costly and risky, particularly in high-stakes domains such as healthcare. We adopt an agent-simulation approach in which all team roles, including the supervisory manager, are instantiated as LLM agents, allowing us to diagnose coordination failures before humans join the team. Using a controllable healthcare scenario, we conduct two studies with different hierarchical configurations to analyze coordination behaviors and failure patterns. Our findings reveal that team structure, rather than contextual knowledge or model capability, constitutes the primary bottleneck for coordination, and expose a tension between reasoning autonomy and system stability. By surfacing these failures in simulation, we prepare the groundwork for safe human integration. These findings inform the design of resilient robot teams with implications for process-level evaluation, transparent coordination protocols, and structured human integration. Supplementary materials, including codes, task agent setup, trace outputs, and annotated examples of coordination failures and reasoning behaviors, are available at: https://byc-sophie.github.io/mas-to-mars/.