DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification

arXiv cs.CL / 4/10/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • DIVERSED(Dynamic Verification Relaxed Speculative Decoding)は、従来のspeculative decodingが持つ「厳密な検証(受理分布の厳格一致)」のボトルネックを緩和して推論速度を高める手法を提案しています。
  • ドラフト(draft)とターゲット(target)モデルの分布を、タスクや文脈に応じて重み付けする“アンサンブル検証器”で統合し、もっと多くの妥当なトークンを受理できるようにします。
  • 理論的な根拠を提示しつつ、実験により標準的なspeculative decodingよりも大幅に高い推論効率(time efficiency)と生成品質の維持を示したとされています。
  • コードはGitHub(comeusr/diversed)で公開されており、手法の再現・検証が可能です。

Abstract

Speculative decoding is an effective technique for accelerating large language model inference by drafting multiple tokens in parallel. In practice, its speedup is often bottlenecked by a rigid verification step that strictly enforces the accepted token distribution to exactly match the target model. This constraint leads to the rejection of many plausible tokens, lowering the acceptance rate and limiting overall time speedup. To overcome this limitation, we propose Dynamic Verification Relaxed Speculative Decoding (DIVERSED), a relaxed verification framework that improves time efficiency while preserving generation quality. DIVERSED learns an ensemble-based verifier that blends the draft and target model distributions with a task-dependent and context-dependent weight. We provide theoretical justification for our approach and demonstrate empirically that DIVERSED achieves substantially higher inference efficiency compared to standard speculative decoding methods. Code is available at: https://github.com/comeusr/diversed.