Outcome-Aware Tool Selection for Semantic Routers: Latency-Constrained Learning Without LLM Inference
arXiv cs.LG / 3/17/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- OATS (Outcome-Aware Tool Selection) is a method to optimize tool selection in semantic routers for LLM inference gateways, aiming to reduce latency while maintaining or improving accuracy.
- The approach operates offline, adding no parameters or serving-time latency, by interpolating tool embeddings toward the centroid of historically successful queries.
- Empirical results show NDCG@5 improvements from 0.869 to 0.940 on MetaTool and from 0.834 to 0.848 on ToolBench, evaluated on a held-out 30% test split.
- Learned extensions include a 2,625-parameter MLP re-ranker and a 197K-parameter contrastive adapter; the MLP can hurt or match the baseline when data is sparse, while the contrastive adapter provides comparable gains on MetaTool.
- The practical takeaway is to start with zero-cost refinement and only add learned components when data density warrants it, with all mechanisms running in single-digit millisecond CPU budgets.
Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO
Dev.to

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents
Dev.to

Perplexity Hub
Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide
Dev.to