SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning

Key Points

The SHAPE framework aims to improve LLM reasoning under process supervision by distinguishing genuine reasoning progress from verbose output and reducing token inefficiency.

Abstract

Process supervision has emerged as a promising approach for enhancing LLM reasoning, yet existing methods fail to distinguish meaningful progress from mere verbosity, leading to limited reasoning capabilities and unresolved token inefficiency. To address this, we propose Stage-aware Hierarchical Advantage via Potential Estimation (SHAPE), a framework that formalizes reasoning as a trajectory through a state space of empirical solvability. SHAPE introduces a hierarchical credit assignment mechanism: at the segment level, it employs a stage-aware advantage function to prioritize efficient breakthroughs in low-potential states; at the token level, it utilizes entropy-driven redistribution to sharpen execution signals. Extensive experiments in math reasoning across three base models and five benchmarks demonstrate that SHAPE achieves an average accuracy gain of 3% with 30% reduced token consumption.

SHAPE: Stage-aware Hierarchical Advantage via Potential Estimation for LLM Reasoning

Key Points

Abstract

Related Articles

Black Hat Asia

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Related Articles

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter
TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled
Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?
Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)
Dev.to