STATe-of-Thoughts: Structured Action Templates for Tree-of-Thoughts

arXiv cs.CL / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Tree-of-Thoughts–style inference-time compute methods can fail to produce meaningful diversity because they rely heavily on high-temperature sampling and provide limited control over the reasoning process.
  • The proposed STATe Of Thoughts (STATe) replaces stochastic sampling with a controller–generator–evaluator framework that uses discrete, interpretable action templates to steer reasoning choices.
  • STATe demonstrates more reliable influence on LLM generations and higher output diversity than temperature-based sampling, via structured textual interventions.
  • In an argument-generation case study, STATe’s explicit action sequences identify interpretable features that are strongly predictive of output quality.
  • By analyzing associations between action choices and performance, STATe can discover promising regions of the reasoning/action space and guide generation toward them for improved controllability and interpretability.

Abstract

Inference-Time-Compute (ITC) methods like Best-of-N and Tree-of-Thoughts are meant to produce output candidates that are both high-quality and diverse, but their use of high-temperature sampling often fails to achieve meaningful output diversity. Moreover, existing ITC methods offer limited control over how to perform reasoning, which in turn limits their interpretability. We present STATe Of Thoughts (STATe), an interpretable ITC method that searches over high-level reasoning patterns. STATe replaces stochastic sampling with discrete and interpretable textual interventions: a controller selects actions encoding high-level reasoning choices; a generator produces reasoning steps conditioned on those choices; and an evaluator scores candidates to guide search. This structured approach yields three main advantages. First, action-guided textual interventions reliably influence LLM generations and produce greater response diversity than temperature-based sampling. Second, in a case study on argument generation, STATe's explicit action sequences capture interpretable features that are highly predictive of output quality. Third, estimating the association between performance and action choices allows us to identify promising yet unexplored regions of the action space and steer generation toward them. Together, these results establish STATe as both a practical framework for diverse and controllable text generation, and as a tool for understanding the reasoning patterns that drive performance.