SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding

arXiv cs.LG / 3/20/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

SpecForge is introduced as an open-source, production-oriented framework for training speculative decoding models with full support for EAGLE-3.
It includes target-draft decoupling, hybrid parallelism, optimized training kernels, and integration with production-grade inference engines to accelerate EAGLE-3 training by up to 9.9x on Qwen3-235B-A22B.
The project releases SpecBundle, a suite of production-grade EAGLE-3 draft models trained with SpecForge for mainstream open-source LLMs, addressing the scarcity of high-quality drafts.
Systematic study of speculative decoding training recipes shows end-to-end inference speedups up to 4.48x on SGLang, establishing SpecForge as a practical foundation for real-world deployment.

Abstract

Large language models incur high inference latency due to sequential autoregressive decoding. Speculative decoding alleviates this bottleneck by using a lightweight draft model to propose multiple tokens for batched verification. However, its adoption has been limited by the lack of high-quality draft models and scalable training infrastructure. We introduce SpecForge, an open-source, production-oriented framework for training speculative decoding models with full support for EAGLE-3. SpecForge incorporates target-draft decoupling, hybrid parallelism, optimized training kernels, and integration with production-grade inference engines, enabling up to 9.9x faster EAGLE-3 training for Qwen3-235B-A22B. In addition, we release SpecBundle, a suite of production-grade EAGLE-3 draft models trained with SpecForge for mainstream open-source LLMs. Through a systematic study of speculative decoding training recipes, SpecBundle addresses the scarcity of high-quality drafts in the community, and our draft models achieve up to 4.48x end-to-end inference speedup on SGLang, establishing SpecForge as a practical foundation for real-world speculative decoding deployment.

We Scanned 11,529 MCP Servers for EU AI Act Compliance

Dev.to

The Complete Guide to AI Prompts for Content Creators

Dev.to

Automating the Chase: AI for Festival Vendor Compliance

Dev.to

From Piles to Protocol: AI for Vendor Compliance at Scale

Dev.to

MCP Skills vs MCP Tools: The Right Way to Configure Your Server

Dev.to

SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding

Key Points

Abstract

Related Articles

We Scanned 11,529 MCP Servers for EU AI Act Compliance

The Complete Guide to AI Prompts for Content Creators

Automating the Chase: AI for Festival Vendor Compliance

From Piles to Protocol: AI for Vendor Compliance at Scale

MCP Skills vs MCP Tools: The Right Way to Configure Your Server

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer