One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries

arXiv cs.CL / 3/13/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper proposes a central Supervisor architecture for autonomous multimodal query processing that coordinates specialized tools across text, image, audio, video, and document modalities.
It introduces RouteLLM for learned routing of text queries and SLM-assisted modality decomposition for non-text paths to dynamically assign subtasks to appropriate tools.
Evaluation on 2,847 queries across 15 task categories shows a 72% reduction in time-to-accurate-answer, an 85% reduction in conversational rework, and a 67% reduction in cost compared with a matched hierarchical baseline.
The results indicate that centralized orchestration can substantially improve multimodal AI deployment economics while preserving accuracy parity.

Abstract

We present an agentic AI framework for autonomous multimodal query processing that coordinates specialized tools across text, image, audio, video, and document modalities. A central Supervisor dynamically decomposes user queries, delegates subtasks to modality-appropriate tools (e.g., object detection, OCR, speech transcription), and synthesizes results through adaptive routing strategies rather than predetermined decision trees. For text-only queries, the framework uses learned routing via RouteLLM, while non-text paths use SLM-assisted modality decomposition. Evaluated on 2,847 queries across 15 task categories, our framework achieves 72% reduction in time-to-accurate-answer, 85% reduction in conversational rework, and 67% cost reduction compared to the matched hierarchical baseline while maintaining accuracy parity. These results demonstrate that intelligent centralized orchestration fundamentally improves multimodal AI deployment economics.

#2 : プロンプト研究講座【第17回】プロンプトの「温度感」と「湿度感」の表現

note

菊地康巳「AIとぼくの研究日記」

note

🧠 Reiが「自分の推論を監査する」存在になった日——STEP181〜186、二層監査体制完成と統合インターフェイスの誕生

note

Title

Dev.to

[Boost]

Dev.to

One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries

Key Points

Abstract

Related Articles

#2 : プロンプト研究講座【第17回】プロンプトの「温度感」と「湿度感」の表現

菊地康巳「AIとぼくの研究日記」

🧠 Reiが「自分の推論を監査する」存在になった日——STEP181〜186、二層監査体制完成と統合インターフェイスの誕生

Title

[Boost]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

#2 : プロンプト研究講座【第17回】プロンプトの「温度感」と「湿度感」の表現

菊地康巳「AIとぼくの研究日記」

🧠 Reiが「自分の推論を監査する」存在になった日——STEP181〜186、二層監査体制完成と統合インターフェイスの誕生

**Title**

[Boost]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Title