Dual-Track CoT: Budget-Aware Stepwise Guidance for Small LMs

arXiv cs.CL / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper investigates how small language models (around 7–8B parameters) can perform multi-step reasoning under strict compute and token budgets using chain-of-thought prompting.
It argues that existing test-time reasoning approaches (e.g., self-consistency, Tree-of-Thoughts, and critique–revise loops) often improve accuracy at the expense of high token cost and lack fine-grained control over each reasoning step.
The proposed “Dual-Track CoT” approach targets this gap by providing budget-aware, stepwise guidance with controls such as rejecting redundant steps to improve reliability without increasing tokens.
The work frames the contribution as both scientific—testing whether step-level process supervision and simple test-time constraints can replace larger model scale or heavy sampling—and practical for cost- and latency-constrained deployments.
The central question is whether small models can achieve reliable reasoning with the same or fewer tokens than prior methods, making it directly relevant for on-device and low-cost inference scenarios.

Abstract

Large Language Models (LLMs) solve many reasoning tasks via chain-of-thought (CoT) prompting, but smaller models (about 7 to 8B parameters) still struggle with multi-step reasoning under tight compute and token budgets. Existing test time reasoning methods such as self consistency (sampling multiple rationales and voting), Tree-of-Thoughts (search over intermediate thoughts), and critique revise loops improve performance, but often at high token cost and without fine-grained step-level control. This project1 aims to address that gap: can Small Language Models (SLMs) reason reliably using the same or fewer tokens? This question is both scientific and practical. Scientifically, it probes whether process supervision and simple test-time controls (such as token budgets and rejection of redundant steps) can substitute for model scale or large sampling counts. Practically, many deployments (on-device, low-latency, or cost-constrained settings) cannot afford huge models or dozens of sampled rationales per query. A method that improves SLM reasoning at fixed cost would therefore be directly useful.

LLMs will be a commodity

Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform

Tech.eu

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

Dev.to

Dual-Track CoT: Budget-Aware Stepwise Guidance for Small LMs

Key Points

Abstract

Related Articles

LLMs will be a commodity

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally

Dex lands $5.3M to grow its AI-driven talent matching platform

AI Citation Registry: Why Daily Updates Leave No Time for Data Structuring

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer