Lightweight Query Routing for Adaptive RAG: A Baseline Study on RAGRouter-Bench

arXiv cs.CL / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study addresses an efficiency problem in Retrieval-Augmented Generation (RAG): choosing the best retrieval strategy per query based on query type to reduce token cost without losing capability.
It provides the first systematic evaluation of lightweight classifier-based query routing on RAGRouter-Bench, using five classical classifiers with three feature regimes (TF-IDF, MiniLM sentence embeddings, and structural features), resulting in 15 feature/classifier combinations.
The best-performing setup, TF-IDF features with an SVM, reaches 0.928 macro-F1 and 93.2% accuracy, while achieving simulated 28.1% token savings compared with always using the most expensive retrieval paradigm.
Lexical TF-IDF features outperform semantic sentence embeddings by 3.1 macro-F1 points, indicating surface keyword patterns are strong predictors of query-type complexity.
Domain analysis shows medical queries are the hardest to route and legal queries are the most tractable, and the authors identify a remaining gap for corpus-aware routing approaches.

Abstract

Retrieval-Augmented Generation pipelines span a wide range of retrieval strategies that differ substantially in token cost and capability. Selecting the right strategy per query is a practical efficiency problem, yet no routing classifiers have been trained on RAGRouter-Bench \citep{wang2026ragrouterbench}, a recently released benchmark of

7,727

queries spanning four knowledge domains, each annotated with one of three canonical query types: factual, reasoning, and summarization. We present the first systematic evaluation of lightweight classifier-based routing on this benchmark. Five classical classifiers are evaluated under three feature regimes, namely, TF-IDF, MiniLM sentence embeddings \citep{reimers2019sbert}, and hand-crafted structural features, yielding 15 classifier feature combinations. Our best configuration, TF-IDF with an SVM, achieves a macro-averaged F1 of

\mathbf{0.928}

and an accuracy of

\mathbf{93.2\%}

, while simulating

\mathbf{28.1\%}

token savings relative to always using the most expensive paradigm. Lexical TF-IDF features outperform semantic sentence embeddings by

3.1

macro-F1 points, suggesting that surface keyword patterns are strong predictors of query-type complexity. Domain-level analysis reveals that medical queries are hardest to route and legal queries most tractable. These results establish a reproducible query-side baseline and highlight the gap that corpus-aware routing must close.

Black Hat Asia

AI Business

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents

MarkTechPost

Chatbots are great at manipulating people to buy stuff, Princeton boffins find

The Register

I tested and ranked every ai companion app I tried and here's my honest breakdown

Reddit r/artificial

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Lightweight Query Routing for Adaptive RAG: A Baseline Study on RAGRouter-Bench

Key Points

Abstract

Related Articles

Black Hat Asia

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents

Chatbots are great at manipulating people to buy stuff, Princeton boffins find

I tested and ranked every ai companion app I tried and here's my honest breakdown

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer