| Did some more work on my Raspberry Pi inference setup.
The demo above is running this specific quant: https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF/blob/main/Qwen3.5-35B-A3B-UD-Q2_K_XL.gguf Some numbers for what to expect now (all tests on 16k context, vision encoder enabled):
Let me know what you guys think. Also, if anyone has a Pi 5 and wants to try it and poke around, lemme know. I have some other tweaks I'm actively testing (for example asymmetric KV cache quantisation, have some really good boosts in prompt processing) [link] [comments] |
Update on Qwen 3.5 35B A3B on Raspberry PI 5
Reddit r/LocalLLaMA / 3/12/2026
📰 News
Key Points
- The author demonstrates running Qwen 3.5 35B A3B on Raspberry Pi 5 using a modified llama.cpp workflow (combining the OG repo with ik_llama tweaks) and prompt caching.