SeLaR: Selective Latent Reasoning in Large Language Models

arXiv cs.CL / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes SeLaR (Selective Latent Reasoning), a training-free method that improves chain-of-thought reasoning in large language models by selectively using latent (soft) reasoning only when the model is uncertain.
  • It addresses prior latent-reasoning limitations, including reasoning instability from global soft activation and the tendency of soft embeddings to collapse toward the most likely token.
  • SeLaR uses an entropy-gated mechanism to switch between soft embeddings at low-confidence steps and discrete decoding at high-confidence steps, aiming to preserve stability while enabling exploration.
  • It adds entropy-aware contrastive regularization that discourages soft embeddings from aligning with the dominant token direction, encouraging multiple reasoning trajectories.
  • Experiments across five reasoning benchmarks report consistent performance gains over standard CoT and other training-free approaches.

Abstract

Chain-of-Thought (CoT) has become a cornerstone of reasoning in large language models, yet its effectiveness is constrained by the limited expressiveness of discrete token sampling. Recent latent reasoning approaches attempt to alleviate this limitation by replacing discrete tokens with soft embeddings (probability-weighted mixtures of token embeddings) or hidden states, but they commonly suffer from two issues: (1) global activation injects perturbations into high-confidence steps, impairing reasoning stability; and (2) soft embeddings quickly collapse toward the highest-probability token, limiting exploration of alternative trajectories. To address these challenges, we propose SeLaR (Selective Latent Reasoning), a lightweight and training-free framework. SeLaR introduces an entropy-gated mechanism that activates soft embeddings only at low-confidence steps, while preserving discrete decoding at high-confidence steps. Additionally, we propose an entropy-aware contrastive regularization that pushes soft embeddings away from the dominant (highest-probability) token's direction, encouraging sustained exploration of multiple latent reasoning paths. Experiments on five reasoning benchmarks demonstrate that SeLaR consistently outperforms standard CoT and state-of-the-art training-free methods.