GraphPlanner: Graph Memory-Augmented Agentic Routing for Multi-Agent LLMs

arXiv cs.CL / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • GraphPlanner is a new heterogeneous graph memory-augmented routing approach designed for agentic (multi-round, multi-agent) LLM systems, where it must handle task planning and memory use rather than simple one-shot model selection.
  • It generates a per-query routing workflow by formulating the decision process as a Markov Decision Process (MDP), selecting both an LLM backbone and an agent role (Planner, Executor, Summarizer) at each step.
  • Using a heterogeneous graph called GARNet, GraphPlanner captures interaction memories among queries, agents, and responses, and fuses historical and workflow memory into richer state representations.
  • The full pipeline is optimized with reinforcement learning, and experiments on 14 diverse LLM tasks show up to a 9.3% accuracy improvement while drastically reducing GPU memory usage (from 186.26 GiB to 1.04 GiB).
  • The method also demonstrates strong generalization (including robust zero-shot performance on unseen tasks/LLMs) and effective use of historical memories for both inductive and transductive inference.

Abstract

LLM routing has achieved promising results in integrating the strengths of diverse models while balancing efficiency and performance. However, to support more realistic and challenging applications, routing must extend into agentic LLM settings, where task planning, multi-round cooperation among heterogeneous agents, and memory utilization are indispensable. To address this gap, we propose GraphPlanner, a heterogeneous graph memory-augmented agentic router for multi-agent LLMs that generates routing workflows for each query and supports both inductive and transductive inference. GraphPlanner formulates workflow generation as a Markov Decision Process (MDP), where at each step it selects both the LLM backbone and the agent role, including Planner, Executor, and Summarizer. By leveraging a heterogeneous graph, denoted as GARNet, to capture interaction memories among queries, agents, and responses, GraphPlanner integrates historical memory and workflow memory into richer state representations. The entire pipeline is optimized with reinforcement learning, jointly improving task-specific performance and computational efficiency. We evaluate GraphPlanner across 14 diverse LLM tasks and demonstrate that: (1) GraphPlanner outperforms strong single-round and multi-round routers, improving accuracy by up to 9.3% while reducing GPU cost from 186.26 GiB to 1.04 GiB; (2) GraphPlanner generalizes robustly to unseen tasks and LLMs, exhibiting strong zero-shot capabilities; and (3) GraphPlanner effectively leverages historical memories, supporting both inductive and transductive inference for more adaptive routing. Our code for GraphPlanner is released at https://github.com/ulab-uiuc/GraphPlanner.