Breaking the Autoregressive Chain: Hyper-Parallel Decoding for Efficient LLM-Based Attribute Value Extraction

arXiv cs.AI / 4/30/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • Attribute Value Extraction (AVE) can require generating multiple independent output sequences from the same input, but standard autoregressive decoding is slow because it forces sequential token generation.
  • The paper introduces Hyper-Parallel Decoding (HPD), a decoding algorithm that accelerates offline LLM decoding by parallelizing across batches and using shared memory and computation.
  • HPD improves efficiency by enabling out-of-order token generation via position ID manipulation, which allows independent value generation within each prompt.
  • Experiments on AVE indicate conditional independence of attribute-value pairs, and stacking multiple documents in a single prompt enables parallel decoding of up to 96 tokens per prompt.
  • Results show up to a 13.8× reduction in inference costs and total inference time without degrading output quality, and the method could generalize to other tasks with independent output structures.

Abstract

Some text generation tasks, such as Attribute Value Extraction (AVE), require decoding multiple independent sequences from the same document context. While standard autoregressive decoding is slow due to its sequential nature, the independence between output sequences offers an opportunity for parallelism. We present Hyper-Parallel Decoding, a novel decoding algorithm that accelerates offline decoding by leveraging both shared memory and computation across batches. HPD enables out-of-order token generation through position ID manipulation, significantly improving efficiency. Experiments on AVE show that attribute-value pairs are conditionally independent, enabling us to parallelize value generation within each prompt. By further stacking multiple documents within a single prompt, we can decode in parallel up to 96 tokens per prompt. HPD works with all LLMs, and reduces both inference costs and total inference time by up to 13.8X without compromising output quality, potentially saving hundreds of thousands of dollars on industry AVE tasks. Although designed for attribute extraction, HPD makes no assumptions unique to the AVE domain and can in theory be applied to other scenarios with independent output structures.