RankGuide: Tensor-Rank-Guided Routing and Steering for Efficient Reasoning

arXiv cs.AI / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • Large reasoning models improve multi-step reasoning via chain-of-thought but are costly in latency and compute, motivating collaborative approaches with smaller models that can fail in hard-to-detect ways.
  • The paper analyzes small reasoning model failures in both generated text and hidden-state spaces, categorizing issues into overconfidence, uncertainty, and heavy revalidation.
  • It introduces RankGuide, which uses tensor-rank signals from consecutive hidden states to route between small and large models only when the small model is likely to fail.
  • RankGuide also adds tensor-rank-filtered steering vector extraction to adjust the small model’s reasoning trajectory and improve generation quality.
  • Experiments across multiple benchmarks show RankGuide can cut latency by up to 1.75× versus using only a large model while keeping accuracy competitive with prior collaborative methods.

Abstract

Large reasoning models (LRMs) enhance problem-solving capabilities by generating explicit multi-step chains of thought (CoT) reasoning; however, they incur substantial inference latency and computational overhead. To mitigate this issue, recent works have explored model collaboration paradigms, where small reasoning models (SRMs) generate intermediate reasoning steps to achieve a better accuracy--latency trade-off. Despite recent progress, effectively and efficiently detecting and mitigating SRM failures in collaborative systems remains a key challenge. To address this issue, we analyze SRM inference in both the generated text and hidden-state spaces, and identify three types of failure modes: \textit{overconfidence}, \textit{uncertainty}, and \textit{heavy revalidation}. Building on these insights, we propose \textbf{RankGuide}, a framework that improves the efficiency and effectiveness of SRM--LRM collaboration through tensor-rank-guided routing and steering. Specifically, RankGuide leverages a routing signal that incorporates tensor-rank signals derived from consecutive hidden states to detect when SRMs are likely to fail and selectively invoke LRMs. In addition, we introduce a tensor-rank-filtered steering vector extraction method to modulate the reasoning trajectory of SRMs, thereby improving their generation quality. By improving both routing and steering through tensor-rank signals, RankGuide enables SRM--LRM collaborative systems to achieve more efficient reasoning with fewer steps and improved accuracy. Experiments on multiple reasoning benchmarks demonstrate the efficacy of RankGuide in reducing latency by up to 1.75\times compared to LRM, while maintaining competitive accuracy relative to prior methods.