Do LLM-derived graph priors improve multi-agent coordination?

arXiv cs.LG / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper explores whether large language models (LLMs) can produce coordination graph priors for multi-agent reinforcement learning (MARL) from minimal natural-language descriptions of observations.
  • These LLM-derived priors are injected into MARL using graph convolution layers in a GNN-based pipeline to guide how agents coordinate.
  • Experiments on four cooperative scenarios from the Multi-Agent Particle Environment (MPE) show quantitative improvements over baselines ranging from independent learners to state-of-the-art graph-based coordination methods.
  • The study finds the approach works even with relatively small open-source LLMs, with results indicating that models as small as 1.5B parameters can generate effective priors.
  • An ablation across multiple compact LLMs is used to evaluate how sensitive the quality of the generated priors is to the chosen model.

Abstract

Multi-agent reinforcement learning (MARL) is crucial for AI systems that operate collaboratively in distributed and adversarial settings, particularly in multi-domain operations (MDO). A central challenge in cooperative MARL is determining how agents should coordinate: existing approaches must either hand-specify graph topology, rely on proximity-based heuristics, or learn structure entirely from environment interaction; all of which are brittle, semantically uninformed, or data-intensive. We investigate whether large language models (LLMs) can generate useful coordination graph priors for MARL by using minimal natural language descriptions of agent observations to infer latent coordination patterns. These priors are integrated into MARL algorithms via graph convolutional layers within a graph neural network (GNN)-based pipeline, and evaluated on four cooperative scenarios from the Multi-Agent Particle Environment (MPE) benchmark against baselines spanning the full spectrum of coordination modeling, from independent learners to state-of-the-art graph-based methods. We further ablate across five compact open-source LLMs to assess the sensitivity of prior quality to model choice. Our results provide the first quantitative evidence that LLM-derived graph priors can enhance coordination and adaptability in dynamic multi-agent environments, and demonstrate that models as small as 1.5B parameters are sufficient for effective prior generation.