Multi-Token Prediction (MTP) for qwen-3.5 is coming to mlx-lm

Reddit r/LocalLLaMA / 3/21/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

Multi-Token Prediction (MTP) is coming to mlx-lm for the qwen-3.5 series, enabling generating multiple tokens per forward pass.
Early benchmarks show throughput improving from 15.3 to 23.3 tokens per second (about 1.5x) with an acceptance rate around 80.6%.
The feature is contributed by the AirRunner team and is documented in the GitHub PR mlx-lm/pull/990.
This update reflects ongoing work in the LocalLLaMA community and is shared via the Reddit post linked in the announcement.

🚀 Big update for the LocalLlama community: Multi-Token Prediction (MTP) is coming to mlx-lm for the qwen-3.5 series.

(not my PR, just sharing because this is cool 👇)

Early support for generating multiple tokens per forward pass is in, and the gains already look solid:

• 15.3 → 23.3 tok/s (~1.5x throughput boost)
• ~80.6% acceptance rate

The author of the PR benchmarked with Qwen3.5-27B 4-bit on an M4 Pro.

Huge kudos to AirRunner for contributing this 🙌
PR: https://github.com/ml-explore/mlx-lm/pull/990

Dev.to

Reddit r/LocalLLaMA

Reddit r/LocalLLaMA

Dev.to

Dev.to