Task-Aware LLM Routing with Multi-Level Task-Profile-Guided Data Synthesis for Cold-Start Scenarios

arXiv cs.CL / 4/13/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a multi-level task-profile-guided data synthesis method to generate diverse QA pairs that better approximate test-time query distributions, especially when in-domain data is unavailable for cold-start routing.
  • It introduces TRouter, a task-type-aware LLM routing approach that uses latent task-type variables to model query-conditioned cost and performance.
  • TRouter leverages a prior derived from the synthesized hierarchical task taxonomy to regularize routing decisions under limited or missing in-domain training data.
  • Experiments on multiple benchmarks indicate that the synthesis framework reduces cold-start issues and that TRouter improves practical LLM routing quality across both cold-start and in-domain settings.

Abstract

Large language models (LLMs) exhibit substantial variability in performance and computational cost across tasks and queries, motivating routing systems that select models to meet user-specific cost-performance trade-offs. However, existing routers generalize poorly in cold-start scenarios where in-domain training data is unavailable. We address this limitation with a multi-level task-profile-guided data synthesis framework that constructs a hierarchical task taxonomy and produces diverse question-answer pairs to approximate the test-time query distribution. Building on this, we introduce TRouter, a task-type-aware router approach that models query-conditioned cost and performance via latent task-type variables, with prior regularization derived from the synthesized task taxonomy. This design enhances TRouter's routing utility under both cold-start and in-domain settings. Across multiple benchmarks, we show that our synthesis framework alleviates cold-start issues and that TRouter delivers effective LLM routing.