Prefix Parsing is Just Parsing

arXiv cs.CL / 4/24/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper studies prefix parsing: determining whether a given input prefix can be completed into a full string produced by a grammar, and (in weighted cases) computing prefix probabilities relevant to language modeling and constrained generation.
  • It introduces a “prefix grammar transformation” that reduces prefix parsing to standard ordinary parsing by constructing a new grammar that generates exactly the prefixes of the original grammar’s strings.
  • By running any existing optimized parsing algorithm on the transformed grammar, the method avoids developing custom prefix-parsing algorithms while remaining efficient (the transformed grammar grows only by a small factor).
  • The authors also propose a strategy using algorithmic differentiation to compute the next-token weight vector, i.e., probabilities/weights for all one-token extensions, supporting efficient next-token prediction.
  • Overall, the work offers a general, practical framework for prefix parsing and next-token weighting that can be plugged into existing parsing implementations.

Abstract

Prefix parsing asks whether an input prefix can be extended to a complete string generated by a given grammar. In the weighted setting, it also provides prefix probabilities, which are central to context-free language modeling, psycholinguistic analysis, and syntactically constrained generation from large language models. We introduce the prefix grammar transformation, an efficient reduction of prefix parsing to ordinary parsing. Given a grammar, our method constructs another grammar that generates exactly the prefixes of its original strings. Prefix parsing is then solved by applying any ordinary parsing algorithm on the transformed grammar without modification. The reduction is both elegant and practical: the transformed grammar is only a small factor larger than the input, and any optimized implementation can be used directly, eliminating the need for bespoke prefix-parsing algorithms. We also present a strategy-based on algorithmic differentiation-for computing the next-token weight vector, i.e., the prefix weights of all one-token extensions, enabling efficient prediction of the next token. Together, these contributions yield a simple, general, and efficient framework for prefix parsing.