Hive: A Multi-Agent Infrastructure for Algorithm- and Task-Level Scaling

arXiv cs.AI / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper argues that while LLM scaling has focused on model- and system-level improvements, algorithm-level (extra inference-time compute) and task-level (multi-agent decomposition and delegation) scaling are still underexplored.
  • It introduces Hive, a multi-agent infrastructure with a description frontend to capture per-agent behavior and enable test-time scaling algorithms.
  • Hive’s backend includes Logits Cache to reuse intermediate logits across redundant sampling branches, reducing cross-path redundancy and improving resampling speed.
  • It also includes Agent-Aware Scheduling to allocate compute and KV-cache resources based on each agent’s contribution, improving task-level parallel efficiency.
  • Experiments report 1.11×–1.76× speedups for re-sampling with Logits Cache and a 33%–51% reduction in hotspot miss rate with Agent-Aware Scheduling.

Abstract

Large language models are increasingly deployed as complex agentic systems that scale with task complexity. While prior work has extensively explored model- and system-level scaling, algorithm- and task-level scaling remain largely unaddressed, constraining the full potential of agentic systems. At the algorithm level, allocating additional inference-time computation can enhance workflow capacity but introduces cross-path redundancy: overlapping computations across multiple reasoning branches. At the task level, complex tasks can be decomposed into subproblems and delegated across multiple agents for improved scalability and parallelism. However, existing infrastructures' scheduling is unaware of the existence of multiple agents, missing opportunities to optimize resource allocation. We propose Hive, a multi-agent infrastructure that enables algorithm- and task-level scaling. Hive features a description frontend that captures per-agent behavior and supports test-time scaling algorithms. Leveraging this specification, our backend introduces two key mechanisms: Logits Cache that reuses intermediate logits across redundant sampling paths to mitigate cross-path redundancy at the algorithm level, and Agent-Aware Scheduling that efficiently allocates compute and KV-cache resources according to agent contributions at the task level. Experiments show that Logits Cache achieves an average speedup of 1.11\times-1.76\times for re-sampling, and Agent-Aware Scheduling reduces the hotspot miss rate by 33\%-51\%.