The Format Tax

arXiv cs.CL / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that asking an LLM to output structured formats (JSON/XML/LaTeX/Markdown) should not reduce reasoning quality, but structured-output requirements significantly degrade performance in open-weight models.
  • The study finds that constrained decoding explains only a small part of the loss; most accuracy degradation comes directly from the prompt-level instructions demanding a specific format.
  • Across six open-weight models (and additional comparisons with closed-weight models), the authors show that separating/decoupling reasoning from formatting substantially recovers lost accuracy.
  • The proposed recovery strategies include generating freeform first and then reformatting in a second pass, or allowing extended “thinking” within a single generation before producing the final structured output.
  • Closed-weight models are observed to exhibit little to no “format tax,” implying the issue is largely a current open-weight-model capability gap rather than a fundamental limitation of structured generation.

Abstract

Asking a large language model to respond in JSON should be a formatting choice, not a capability tax. Yet we find that structured output requirements -- JSON, XML, LaTeX, Markdown -- substantially degrade reasoning and writing performance across open-weight models. The research response has focused on constrained decoding, but sampling bias accounts for only a fraction of the degradation. The dominant cost enters at the prompt: format-requesting instructions alone cause most of the accuracy loss, before any decoder constraint is applied. This diagnosis points to a simple principle: decouple reasoning from formatting. Whether by generating freeform first and reformatting in a second pass, or by enabling extended thinking within a single generation, separating the two concerns substantially recovers lost accuracy. Across six open-weight models, four API models, four formats, and tasks spanning math, science, logic, and writing, decoupling recovers most lost accuracy. Notably, most recent closed-weight models show little to no format tax, suggesting the problem is not inherent to structured generation but a gap that current open-weight models have yet to close. Code is available at https://github.com/ivnle/the-format-tax.