Waiting for artificialanalysis to produce intelligence index, but I see it's good. Gemma 26b a4b is the same speed on Mac Studio M1 Ultra as Qwen3.5 35b a3b (~1000pp, ~60tg at 20k context length, llama.cpp). And in my short test, it behaves way, way better than Qwen, not even close. Chain of thoughts on Gemma is concise, helpful and coherent while Qwen does a lot of inner-gaslighting, and also loops a lot on default settings. Visual understanding is very good, and multilingual seems good as well. Tested Q4_K_XL on both.
I wonder if mlx-vlm properly handles prompt caching for Gemma (it doesn't work for Qwen 3.5).
Too bad it's KV cache is gonna be monstrous as it did not implement any tricks to reduce that, hopefully TurboQuant will help with that soon.
I expect censorship to be dogshit, I saw that e4b loves to refuse any and all medical advice. Maybe good prompting will mitigate that as "heretic" and "abliterated" versions seem to damage performance in many cases.
No formatting because this is handwritten by a human for a change.
[link] [comments]




