Select to Think: Unlocking SLM Potential with Local Sufficiency
arXiv cs.CL / 4/30/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- Small language models (SLMs) are efficient but typically lack the reasoning quality of larger LLMs, and common fixes that call an external LLM add major latency and cost.
- The paper introduces “local sufficiency,” showing that at reasoning divergence points the LLM’s preferred token often remains within the SLM’s top-K next-token candidates even if it is not the SLM’s top-1 choice.
- Based on this, the authors propose SELECT TO THINK (S2T), reframing the teacher LLM’s role from open-ended generation to selecting among the SLM’s candidate proposals, turning training into discrete candidate ranking.
- They further present S2T-LOCAL, which distills this selection/reranking behavior into the SLM so it can rerank autonomously at inference time without any LLM calls.
- Experiments show a 1.5B SLM with top-8 candidates recovers the 32B LLM’s choice with a 95% hit rate, and S2T-LOCAL improves greedy decoding by 24.1% on average—matching multi-path self-consistency performance with single-trajectory efficiency.
Related Articles
Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]
Reddit r/MachineLearning

Agent Amnesia and the Case of Henry Molaison
Dev.to

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry
Dev.to

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance
Dev.to

Vibe coding is a tool, not a shortcut. Most people are using it wrong.
Dev.to