AI Navigate

Training-Free Agentic AI: Probabilistic Control and Coordination in Multi-Agent LLM Systems

arXiv cs.CL / 3/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • REDEREF is a lightweight, training-free controller that coordinates multi-agent LLM collaboration to improve routing efficiency during recursive delegation.
  • It combines belief-guided delegation with Thompson sampling to prioritize agents with historically positive marginal contributions, reflection-driven re-routing via a calibrated LLM or judge, and evidence-based selection rather than output averaging.
  • Across multi-agent split-knowledge tasks, REDEREF reduces token usage by 28%, agent calls by 17%, and time-to-success by 19% compared with random recursive delegation.
  • The method adapts gracefully under agent or judge degradation and does not require training or fine-tuning.

Abstract

Multi-agent large language model (LLM) systems enable complex, long-horizon reasoning by composing specialized agents, but practical deployment remains hindered by inefficient routing, noisy feedback, and high interaction cost. We introduce REDEREF, a lightweight and training-free controller for multi-agent LLM collaboration that improves routing efficiency during recursive delegation. REDEREF integrates (i) belief-guided delegation via Thompson sampling to prioritize agents with historically positive marginal contributions, (ii) reflection-driven re-routing using a calibrated LLM or programmatic judge, (iii) evidence-based selection rather than output averaging, and (iv) memory-aware priors to reduce cold-start inefficiency. Across multi-agent split-knowledge tasks, we show that while recursive retry alone saturates task success, belief-guided routing reduces token usage by 28%, agent calls by 17%, and time-to-success by 19% compared to random recursive delegation, and adapts gracefully under agent or judge degradation. These results demonstrate that simple, interpretable probabilistic control can meaningfully improve the efficiency and robustness of multi-agent LLM systems without training or fine-tuning.