The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling

arXiv cs.AI / 5/5/2026

📰 NewsModels & Research

共有:

Key Points

The paper addresses a key challenge in training-free “reasoning without training”: base LLMs already put some probability on correct multi-step solutions, but inference-time decoding must efficiently find those probability modes.
It proposes Auxiliary Particle Power Sampling (APPS), which approximates a sequence-level power target proportional to p_theta(x)^alpha (with alpha > 1) using a bounded number of particles in a blockwise, parallel way.
APPS uses proposal-corrected power reweighting and future-value-guided selection at resampling boundaries to allocate compute across competing prefixes instead of committing to one decoding path.
The method includes practical future-value estimates via short-horizon rollouts and an amortized variant that uses a lightweight learned selection head to reduce overhead.
Experiments on reasoning benchmarks show APPS improves the accuracy–runtime trade-off for training-free decoding and indicates that more of the gap to post-trained systems can be recovered via better inference-time power approximation.

Abstract

A recurring pattern in "reasoning without training" is that base LLMs already assign non-trivial probability mass to correct multi-step solutions; the bottleneck is locating these modes efficiently at inference time. Power sampling provides a principled way to bias decoding toward such modes by targeting p_theta(x)^alpha with alpha > 1, but practical approximations must account for future-dependent correction factors that determine which prefixes remain promising. We introduce Auxiliary Particle Power Sampling (APPS), a blockwise particle algorithm for approximating the sequence-level power target with a bounded population of partial solutions. APPS propagates hypotheses in parallel using proposal-corrected power reweighting and refines their survival through future-value-guided selection at resampling boundaries. This redistributes finite compute across competing prefixes rather than committing to a single unfolding path, while providing a direct scaling knob in the particle count and predictable peak memory. We instantiate the future-value signal with short-horizon rollouts and also study an amortized variant that replaces rollouts with a lightweight learned selection head. Across reasoning benchmarks, APPS improves the accuracy-runtime trade-off of training-free decoding and suggests that part of the gap to post-trained systems can be recovered through more faithful inference-time power approximation.

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

Dev.to

The Refund Buried in Export Paperwork: Why Customs Drawback Claim Assembly Fits an Agent Better Than Another Research Bo

Dev.to

Gemini File Generation Guide: How to Create PDFs, Word Docs & Excel Files with AI (2026)

Dev.to

How an AI Agent Executed 500+ Real-World Operations and Built Its Own Recovery Engine

Dev.to

Qwen 3.6 27B MTP on v100 32GB: 54 t/s

Reddit r/LocalLLaMA

The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling

Key Points

Abstract

Related Articles

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

The Refund Buried in Export Paperwork: Why Customs Drawback Claim Assembly Fits an Agent Better Than Another Research Bo

Gemini File Generation Guide: How to Create PDFs, Word Docs & Excel Files with AI (2026)

How an AI Agent Executed 500+ Real-World Operations and Built Its Own Recovery Engine

Qwen 3.6 27B MTP on v100 32GB: 54 t/s

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer