Graph-of-Agents: A Graph-based Framework for Multi-Agent LLM Collaboration

arXiv cs.AI / 4/21/2026

📰 NewsModels & Research

Key Points

  • The paper introduces Graph-of-Agents (GoA), a graph-based framework to orchestrate multi-agent LLM collaboration for better task performance.
  • GoA improves over prior approaches by (1) sampling only the most relevant agents using model-card metadata, (2) creating graph edges via response-based relevance ordering, and (3) using directed message passing plus reverse refinement before aggregating answers with graph pooling.
  • Experiments across multiple benchmarks (MMLU, MMLU-Pro, GPQA, MATH, HumanEval, MedMCQA) with a pool of six LLMs show that GoA can outperform baselines that use all six agents.
  • Notably, GoA achieves stronger results using only three selected agents, suggesting relevance-based selection and structured communication can reduce agent count without sacrificing quality.
  • The authors provide code on GitHub and position GoA as a scalable method for managing the growing number of available LLMs and agent candidates.

Abstract

With an ever-growing zoo of LLMs and benchmarks, the need to orchestrate multiple models for improved task performance has never been more pressing. While frameworks like Mixture-of-Agents (MoA) attempt to coordinate LLMs, they often fall short in terms of (1) selecting relevant agents, (2) facilitating effective intra-agent communication, and (3) integrating responses efficiently. In this work, we propose Graph-of-Agents (GoA), a new graph-based framework for modeling multi-agent LLM communication. Our approach begins with node sampling, selecting only the most relevant agents by leveraging model cards that summarize each model's domain, task specialization, and other characteristics. Next, we construct edges between the selected agents by evaluating their responses against one another to determine relevance ordering. Directed message passing is then performed from highly relevant agents to less relevant ones to enhance their responses, followed by reverse message passing to refine the original responses of the more relevant agents. Finally, the updated responses are aggregated via graph-based pooling (e.g., max or mean pooling) to produce a single, unified answer. We evaluate GoA on diverse multi-domain benchmarks (MMLU, MMLU-Pro, GPQA) and domain-specific benchmarks (MATH, HumanEval, MedMCQA), with an agent pool of 6 LLMs spanning multiple domains. Surprisingly, GoA achieves superior performance using only 3 selected agents, outperforming recent multi-agent LLM baselines that utilize all 6 agents simultaneously. By adopting a graph structure, GoA offers both scalability and effectiveness through structured message passing-positioning it as a strong candidate for navigating the challenges of the ever-growing LLM zoo. Code is available at: https://github.com/UNITES-Lab/GoA.