Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees
arXiv cs.LG / 4/23/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper argues that self-consistency methods for LLM inference can be compute-inefficient in domains like math and code because sampling with replacement repeatedly revisits the same prefixes and produces duplicate completions.
- It introduces Distinct Leaf Enumeration (DLE), a deterministic decoding approach that views truncated sampling as traversing a pruned decoding tree and enumerates distinct leaves to avoid redundant sampling.
- DLE improves efficiency by increasing coverage of the truncated search space within the same compute budget and by reusing shared prefixes to reduce unnecessary token generation.
- Experiments show that DLE can produce higher-quality reasoning traces than stochastic self-consistency, improving performance across math, coding, and general reasoning tasks.
- The work presents DLE as a practical alternative to sampling-based self-consistency when compute budgets are limited and diversity of completions matters.
Related Articles

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

Elevating Austria: Google invests in its first data center in the Alps.
Google Blog

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago
Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity
Dev.to

AI Tutor That Works Offline — Study Anywhere with EaseLearn AI
Dev.to