Guided Speculative Inference for Efficient Test-Time Alignment of LLMs

arXiv stat.ML / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Guided Speculative Inference (GSI), an algorithm for efficient reward-guided decoding in large language models at test time.
GSI uses a soft best-of-n strategy combined with a reward model r(x,y) and speculative candidate samples generated by a smaller auxiliary model π_S(y|x).
The authors provide provable approximations of the optimal tilted policy (based on exp(β·r(x,y))) and of the expected reward under that optimal policy.
Experiments across multiple reasoning and academic benchmarks show GSI improves accuracy over standard soft best-of-n (using the auxiliary model) and over reward-guided speculative decoding, and can even beat soft best-of-n using the base model in some settings.
Reported end-to-end latency is reduced by up to 28%, and the authors released code on GitHub.

Abstract

We propose Guided Speculative Inference (GSI), a novel algorithm for efficient reward-guided decoding in large language models. GSI combines soft best-of-

n

test-time scaling with a reward model

r(x,y)

and speculative samples from a small auxiliary model

\pi_S(y\mid x)

. We provably approximate both the optimal tilted policy

\pi_{\beta,B}(y\mid x) \propto \pi_B(y\mid x)\exp(\beta\,r(x,y))

of soft best-of-

n

under the base model

\pi_B

, as well as the expected reward under the optimal policy. In experiments on reasoning benchmarks (MATH500, OlympiadBench, Minerva Math, MMLU-STEM, GSM8K) and across different model families, our method achieves higher accuracy than standard soft best-of-

n

with

\pi_S

and reward-guided speculative decoding (Liao et al., 2025), and in certain settings even outperforms soft best-of-

n

with

\pi_B

, while reducing end-to-end latency by up to

28\%

. The code is available at https://github.com/j-geuter/GSI .

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

How I Automate My Dev Workflow with Claude Code Hooks

Dev.to

Same Agent, Different Risk | How Microsoft 365 Copilot Grounding Changes the Security Model | Rahsi Framework™

Dev.to

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

Dev.to

Guided Speculative Inference for Efficient Test-Time Alignment of LLMs

Key Points

Abstract

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

How I Automate My Dev Workflow with Claude Code Hooks

Same Agent, Different Risk | How Microsoft 365 Copilot Grounding Changes the Security Model | Rahsi Framework™

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer