Gemma 4 MTP (Multi-token Processing) for the MLX runner
Gemma 4 MTP speculative decoding is now supported on Macs. This can give over a 2x speed increase for the Gemma 4 31B model on coding tasks.
ollama run gemma4:31b-coding-mtp-bf16
What's Changed
- Update MLX and MLX-C with threading fixes by @dhiltgen in #15845
- go: bump to 1.26 by @ParthSareen in #15904
- Add Gemma 4 MTP speculative decoding by @pdevine in #15980
Full Changelog: v0.23.0...v0.23.1-rc0




