ExecTune: Effective Steering of Black-Box LLMs with Guide Models

arXiv cs.LG / 4/14/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

本論文は、ブラックボックスAPI経由でLLMを使う際に推論コストが学習コストを上回りやすいという課題に対し、ガイドモデルが戦略（中間表現）を生成し、コアLLMがそれを実行する「Guide-Core Policies（GCoP）」という枠組みを整理しています。
GCoPの性能は、ガイドが生成した戦略がコアで忠実に実行できる確率（guide-averaged executability）に強く支配されることを理論的に示し、従来手法が実行可能性を十分に最適化できておらず脆い戦略や非効率な計算が起きると指摘しています。
これを踏まえて提案された訓練レシピがExecTuneで、受理サンプリング付きのteacher-guided手法、構造に配慮した強化学習、そして教師あり微調整を組み合わせ、構文妥当性・実行成功・コスト効率を同時に最適化します。
数学・コード生成ベンチマークで、ExecTuneを用いたGCoPが先行手法に対して最大9.2%の精度向上と最大22.4%の推論コスト削減を達成し、さらにClaude Haiku 3.5がSonnet 3.5を上回るなど、同じコアを保持したままガイド更新でモジュール的適応も可能だと報告しています。

Abstract

For large language models deployed through black-box APIs, recurring inference costs often exceed one-time training costs. This motivates composed agentic systems that amortize expensive reasoning into reusable intermediate representations. We study a broad class of such systems, termed Guide-Core Policies (GCoP), in which a guide model generates a structured strategy that is executed by a black-box core model. This abstraction subsumes base, supervised, and advisor-style approaches, which differ primarily in how the guide is trained. We formalize GCoP under a cost-sensitive utility objective and show that end-to-end performance is governed by guide-averaged executability: the probability that a strategy generated by the guide can be faithfully executed by the core. Our analysis shows that existing GCoP instantiations often fail to optimize executability under deployment constraints, resulting in brittle strategies and inefficient computation. Motivated by these insights, we propose ExecTune, a principled training recipe that combines teacher-guided acceptance sampling, supervised fine-tuning, and structure-aware reinforcement learning to directly optimize syntactic validity, execution success, and cost efficiency. Across mathematical reasoning and code-generation benchmarks, GCoP with ExecTune improves accuracy by up to 9.2% over prior state-of-the-art baselines while reducing inference cost by up to 22.4%. It enables Claude Haiku 3.5 to outperform Sonnet 3.5 on both math and code tasks, and to come within 1.7% absolute accuracy of Sonnet 4 at 38% lower cost. Beyond efficiency, GCoP also supports modular adaptation by updating the guide without retraining the core.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/14DailyView insight →

Don't forget, there is more than forgetting: new metrics for Continual Learning

Dev.to

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Dev.to

Bit of a strange question?

Reddit r/artificial

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

Dev.to

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

Dev.to

ExecTune: Effective Steering of Black-Box LLMs with Guide Models

Key Points

Abstract

💡 Insights using this article

Related Articles

Don't forget, there is more than forgetting: new metrics for Continual Learning

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale

Bit of a strange question?

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer