DFlash Doubles the T/S Gen Speed of Qwen3.5 27B (BF16) on Mac M5 Max

Reddit r/LocalLLaMA / 4/15/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

oMLX 0.3.5 RC1 に追加された DFlash サポートにより、Mac M5 Max（128GB）上で Qwen3.5 27B (BF16) の生成スループットが大幅に向上したとする初期テスト結果が共有された。
T/S（token/s）は 9 から 22 T/S へ増加したと報告され、速度面がボトルネックだった同モデルのローカル展開が現実的になる可能性が示された。
使用された構成は、メインモデルが Jackrong/MLX-Qwopus3.5-27B-v3-bf16、Draft モデルが z-lab/Qwen3.5-27B-DFlash で、Draft 推論を活用する仕組みが前提になっている。
DFlash の実装は GitHub（bstnxbt/dflash-mlx）で公開されており、筆者は OpenCode など別ベンチマークでは未検証であるとも述べている。

DFlash Doubles the T/S Gen Speed of Qwen3.5 27B (BF16) on Mac M5 Max

The new DFlash support in oMLX 0.3.5 RC1 looks like it doubles (!!!) the speed of Qwen3.5 27B (BF16). Initial test. Generation T/S went from 9 to 22 T/S!

Models used (HuggingFace)

Main Model: Jackrong/MLX-Qwopus3.5-27B-v3-bf16
Draft Model: z-lab/Qwen3.5-27B-DFlash

System: M5 Max 128GB

DFlash on Github: https://github.com/bstnxbt/dflash-mlx?tab=readme-ov-file

oMLX (v0.3.5 RC1): https://omlx.ai

I'm not affiliated with any of the developers. Since the Qwen3.5 27B model is so good for the size, with speed being the only thing holding it back, I thought that this may help deploy this model locally at higher quants/full weights.

I've yet to test with OpenCode or other harness.

submitted by /u/MiaBchDave
[link] [comments]