TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification

arXiv cs.AI / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

TRACER is an open-source routing system that trains lightweight ML surrogates using labeled input-output pairs already collected in production logs from an LLM classification endpoint.
It deploys a surrogate only when a “parity gate” indicates its agreement with the LLM exceeds a user-defined threshold (α), aiming to reduce marginal inference cost.
TRACER creates interpretability artifacts to make the surrogate-to-LLM handoff boundary transparent, including what input regions the surrogate covers, where it plateaus, and why it defers.
Experiments on benchmark intent classification show high surrogate coverage (83–100% with a 77-class, Sonnet 4.6 setup, and full replacement on a 150-class task), while a natural language inference task correctly prevents deployment when reliable separation is not possible.

Abstract

Every call to an LLM classification endpoint produces a labeled input-output pair already retained in production logs. These pairs constitute a free, growing training set: a lightweight surrogate trained on them can absorb a significant portion of future traffic at near-zero marginal inference cost. The open questions are when the surrogate is reliable enough to deploy, what it handles versus defers, and how that boundary evolves as data accumulates. We introduce TRACER (Trace-based Adaptive Cost-Efficient Routing), an open-source system that trains ML surrogates on an LLM's own production traces and governs deployment through a parity gate: the surrogate is activated only when its agreement with the LLM exceeds a user-specified threshold {\alpha}. To make the routing boundary transparent, TRACER generates interpretability artifacts describing which input regions the surrogate handles, where it plateaus, and why it defers. On a 77-class intent benchmark with a Sonnet 4.6 teacher, TRACER achieves 83-100% surrogate coverage depending on the quality target {\alpha}; on a 150-class benchmark, the surrogate fully replaces the teacher. On a natural language inference task, the parity gate correctly refuses deployment because the embedding representation cannot support reliable separation. The system is available as open-source software.