I ported DTree to MLX ... and finally got one setting that seems to beat matched DFlash locally.
M2 Max 32GB, Qwen3.5-4B, q4_g64, spec=16, tree_budget=24 - DFlash: 45.07 e2e tok/s - DTree: 48.31 e2e tok/s So basically ~1.07x over DFlash. Not massive, but at least it looks real and repeatable enough to mention.
A lot of the other things I tried were flat or just worse, so my current read is that MLX verifier cost is still the main limiter here.
anyone has gotten bigger DTree gains on MLX?
[link] [comments]



