Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
arXiv cs.LG / 3/17/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- OATS (Outcome-Aware Tool Selection) is a method to optimize tool selection in semantic routers for LLM inference gateways, aiming to reduce latency while maintaining or improving accuracy.
- The approach operates offline, adding no parameters or serving-time latency, by interpolating tool embeddings toward the centroid of historically successful queries.
- Empirical results show NDCG@5 improvements from 0.869 to 0.940 on MetaTool and from 0.834 to 0.848 on ToolBench, evaluated on a held-out 30% test split.
- Learned extensions include a 2,625-parameter MLP re-ranker and a 197K-parameter contrastive adapter; the MLP can hurt or match the baseline when data is sparse, while the contrastive adapter provides comparable gains on MetaTool.
- The practical takeaway is to start with zero-cost refinement and only add learned components when data density warrants it, with all mechanisms running in single-digit millisecond CPU budgets.
Related Articles
The massive shift toward edge computing and local processing
Dev.to
Self-Refining Agents in Spec-Driven Development
Dev.to
How to Optimize Your LinkedIn Profile with AI in 2026 (Get Found by Recruiters)
Dev.to
Agentforce Builder: How to Build AI Agents in Salesforce
Dev.to
How AI Consulting Services Support Staff Development in Dubai
Dev.to