Multi-LLM Query Optimization

arXiv cs.LG / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

複数の異種LLMを並列で使って未知ラベルを分類する際、全グラウンドトゥルスに対して誤りを抑える制約のもと総クエリコストを最小化する「ロバストなオフライン問合せ計画」を定式化する。
問題はmin-weight set coverからの帰着によりNP困難であることを示し、解きにくさを踏まえて誤りを分解して近似するサロゲートを提案する。
サロゲートは多クラス誤りをユニオンバウンド分解し、Chernoff型の濃度不等式でペアワイズ比較に置き換えることで、クエリ数に対して閉形式かつ積で分離可能な形を持ち、実行可能性を壊さない（feasibility-preserving）ことを保証する。
サロゲート最適解の真の最適解に対するコスト比が、誤差許容が小さくなるにつれ1に収束すること（最適化レベルで漸近的にタイト）を示し、収束率も明示する。
漸近的なAFPTAS（誤差(1+ε)でサロゲート最適に近いサロゲート実行可能な計画を返す手法）を設計する。

Abstract

Deploying multiple large language models (LLMs) in parallel to classify an unknown ground-truth label is a common practice, yet the problem of optimally allocating queries across heterogeneous models remains poorly understood. In this paper, we formulate a robust, offline query-planning problem that minimizes total query cost subject to statewise error constraints which guarantee reliability for every possible ground-truth label. We first establish that this problem is NP-hard via a reduction from the minimum-weight set cover problem. To overcome this intractability, we develop a surrogate by combining a union bound decomposition of the multi-class error into pairwise comparisons with Chernoff-type concentration bounds. The resulting surrogate admits a closed-form, multiplicatively separable expression in the query counts and is guaranteed to be feasibility-preserving. We further show that the surrogate is asymptotically tight at the optimization level: the ratio of surrogate-optimal cost to true optimal cost converges to one as error tolerances shrink, with an explicit rate of

O\left(\log\log(1/\alpha_{\min}) / \log(1/\alpha_{\min})\right)

. Finally, we design an asymptotic fully polynomial-time approximation scheme (AFPTAS) that returns a surrogate-feasible query plan within a

(1+\varepsilon)

factor of the surrogate optimum.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

I shipped Google's TurboQuant as a vLLM plugin 72 hours after the paper — here's what nobody else tested

Dev.to

We built a governance layer for AI-assisted development (with runtime validation and real system)

Dev.to

No AI system using the forward inference pass can ever be conscious.

Reddit r/artificial

What I wish I knew before running AI agents 24/7

Dev.to

Multi-LLM Query Optimization

Key Points

Abstract

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

I shipped Google's TurboQuant as a vLLM plugin 72 hours after the paper — here's what nobody else tested

We built a governance layer for AI-assisted development (with runtime validation and real system)

No AI system using the forward inference pass can ever be conscious.

What I wish I knew before running AI agents 24/7

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer