Estimating near-verbatim extraction risk in language models with decoding-constrained beam search

arXiv cs.LG / 3/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that existing memorization/extraction-risk measurement methods based on greedy decoding miss variation in risk across different sequences and fail to capture near-verbatim extraction cases.
  • It introduces probabilistic extraction as a way to compute the likelihood of generating a target suffix from a given prefix under a decoding scheme, but notes this is computationally limited to purely verbatim memorization.
  • The authors propose decoding-constrained beam search to approximate near-verbatim extraction risk efficiently, producing deterministic lower bounds at roughly the cost of about 20 Monte Carlo samples per sequence.
  • Experiments show the method reveals substantially more extractable sequences, higher per-sequence extraction probability mass, and model/text-dependent patterns that verbatim-only approaches cannot detect.
  • Overall, the work targets a major privacy/copyright-relevant blind spot by making near-verbatim memorization risk measurable without prohibitive sampling costs.

Abstract

Recent work shows that standard greedy-decoding extraction methods for quantifying memorization in LLMs miss how extraction risk varies across sequences. Probabilistic extraction -- computing the probability of generating a target suffix given a prefix under a decoding scheme -- addresses this, but is tractable only for verbatim memorization, missing near-verbatim instances that pose similar privacy and copyright risks. Quantifying near-verbatim extraction risk is expensive: the set of near-verbatim suffixes is combinatorially large, and reliable Monte Carlo (MC) estimation can require ~100,000 samples per sequence. To mitigate this cost, we introduce decoding-constrained beam search, which yields deterministic lower bounds on near-verbatim extraction risk at a cost comparable to ~20 MC samples per sequence. Across experiments, our approach surfaces information invisible to verbatim methods: many more extractable sequences, substantially larger per-sequence extraction mass, and patterns in how near-verbatim extraction risk manifests across model sizes and types of text.
広告