GRAIL: A Deep-Granularity Hybrid Resonance Framework for Real-Time Agent Discovery via SLM-Enhanced Indexing

arXiv cs.AI / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces GRAIL, a hybrid agent-discovery framework designed to achieve sub-400ms latency while maintaining high accuracy for large-scale multi-agent collaboration.
  • It replaces slow, heavy LLM intent parsing with an SLM-enhanced prediction module that produces capability/taxonomy tags at millisecond speed.
  • To improve retrieval quality, GRAIL expands agent descriptions via pseudo-document expansion (synthetic queries) to increase semantic density for dense retrieval.
  • A MaxSim Resonance matching step computes maximum similarity between user queries and discrete agent usage examples to prevent semantic dilution and improve precision.
  • Experiments on the new AgentTaxo-9K dataset (9,240 agents) show GRAIL cuts end-to-end discovery latency by over 79× versus LLM-parsing baselines and surpasses traditional vector search on Recall@10.

Abstract

As the ecosystem of Large Language Model (LLM)-based agents expands rapidly, efficient and accurate Agent Discovery becomes a critical bottleneck for large-scale multi-agent collaboration. Existing approaches typically face a dichotomy: either relying on heavy-weight LLMs for intent parsing, leading to prohibitive latency (often exceeding 30 seconds), or using monolithic vector retrieval that sacrifices semantic precision for speed. To bridge this gap, we propose \textbf{GRAIL} (Granular Resonance-based Agent/AI Link), a novel framework achieving sub-400ms discovery latency without compromising accuracy. GRAIL introduces three key innovations: (1) \textbf{SLM-Enhanced Prediction}, replacing the generalized LLM parser with a specialized, fine-tuned Small Language Model (SLM) for millisecond-level capability tag prediction; (2) \textbf{Pseudo-Document Expansion}, augmenting agent descriptions with synthetic queries to enhance semantic density for robust dense retrieval; and (3) \textbf{MaxSim Resonance}, a fine-grained matching mechanism computing maximum similarity between user queries and discrete agent usage examples, effectively mitigating semantic dilution. Validated on \textbf{AgentTaxo-9K}, our new large-scale dataset of 9,240 agents, GRAIL reduces end-to-end discovery latency by over \textbf{79\times} compared to LLM-parsing baselines, while significantly outperforming traditional vector search in Recall@10. This framework offers a scalable, industrial-grade solution for the real-time ``Internet of Agents."