DTree on MLX ... tiny win over DFlash on Qwen3.5-4B (M2)..

Reddit r/LocalLLaMA / 4/16/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A developer ported DTree to MLX and reports a small but repeatable speed improvement on an M2 Max 32GB setup using Qwen3.5-4B (q4_g64), where DTree reaches 48.31 e2e tok/s versus DFlash’s 45.07 e2e tok/s (about 1.07x).
The author notes that many other experimental configurations tried on MLX were flat or worse, suggesting the current improvement is narrow but real enough to share.
They conclude that verifier-side cost in MLX remains the primary bottleneck limiting larger DTree gains.
The post links to the project repository (dtree-mlx) and asks the community whether anyone has achieved bigger DTree performance improvements on MLX.

I ported DTree to MLX ... and finally got one setting that seems to beat matched DFlash locally.

M2 Max 32GB, Qwen3.5-4B, q4_g64, spec=16, tree_budget=24 - DFlash: 45.07 e2e tok/s - DTree: 48.31 e2e tok/s

So basically ~1.07x over DFlash. Not massive, but at least it looks real and repeatable enough to mention.

A lot of the other things I tried were flat or just worse, so my current read is that MLX verifier cost is still the main limiter here.

anyone has gotten bigger DTree gains on MLX?

AI Business

AI Business

Reddit r/MachineLearning

Dev.to

Dev.to