Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference

arXiv cs.LG / 3/17/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

OATS (Outcome-Aware Tool Selection) is a method to optimize tool selection in semantic routers for LLM inference gateways, aiming to reduce latency while maintaining or improving accuracy.
The approach operates offline, adding no parameters or serving-time latency, by interpolating tool embeddings toward the centroid of historically successful queries.
Empirical results show NDCG@5 improvements from 0.869 to 0.940 on MetaTool and from 0.834 to 0.848 on ToolBench, evaluated on a held-out 30% test split.
Learned extensions include a 2,625-parameter MLP re-ranker and a 197K-parameter contrastive adapter; the MLP can hurt or match the baseline when data is sparse, while the contrastive adapter provides comparable gains on MetaTool.
The practical takeaway is to start with zero-cost refinement and only add learned components when data density warrants it, with all mechanisms running in single-digit millisecond CPU budgets.

Abstract

Semantic routers in LLM inference gateways select tools in the critical request path, where every millisecond of added latency compounds across millions of requests. We propose Outcome-Aware Tool Selection (OATS), which interpolates tool embeddings toward the centroid of queries where they historically succeed -- an offline process that adds no parameters, latency, or GPU cost at serving time. On MetaTool (199~tools, 4,287~queries), this improves NDCG@5 from 0.869 to 0.940; on ToolBench (2,413~APIs), from 0.834 to 0.848. We also evaluate two learned extensions: a 2,625-parameter MLP re-ranker and a 197K-parameter contrastive adapter. The MLP re-ranker hurts or matches baseline when outcome data is sparse relative to the tool set; the contrastive adapter provides comparable gains on MetaTool (NDCG@5: 0.931). All methods are evaluated on the same held-out 30\% test split. The practical takeaway is to start with the zero-cost refinement and add learned components only when data density warrants it. All mechanisms run within single-digit millisecond CPU budgets.

The massive shift toward edge computing and local processing

Dev.to

Self-Refining Agents in Spec-Driven Development

Dev.to

How to Optimize Your LinkedIn Profile with AI in 2026 (Get Found by Recruiters)

Dev.to

Agentforce Builder: How to Build AI Agents in Salesforce

Dev.to

How AI Consulting Services Support Staff Development in Dubai

Dev.to

Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference

Key Points

Abstract

Related Articles

The massive shift toward edge computing and local processing

Self-Refining Agents in Spec-Driven Development

How to Optimize Your LinkedIn Profile with AI in 2026 (Get Found by Recruiters)

Agentforce Builder: How to Build AI Agents in Salesforce

How AI Consulting Services Support Staff Development in Dubai

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer