Can we talk about the reasoning token format chaos?

Reddit r/LocalLLaMA / 4/10/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The post highlights how different LLMs use inconsistent “reasoning token” formats, including examples like Qwen/DeepSeek’s <think>...</think>, Gemma’s <|channel>...</channel|> style, and cases where some models emit bare “thought” text without delimiters.
  • Qwen/DeepSeek: <think>...</think>
  • Gemma: <|channel>...<channel|> Ok weird but sure.
  • Gemma again, sometimes: just bare thought with no delimiters at all

vLLM has --reasoning-parser flags per model which helps but that's basically just the vLLM maintainers volunteering to play whack-a-mole forever. And if you're doing anything downstream with the raw output you're still writing your own parser per model.

We just went through this with chat templates. Now we're doing it again.

Is this just Google being Google? Anyone seen any actual movement toward standardizing this or are we just vibing?

submitted by /u/ahinkle
[link] [comments]