just quick numbers for anyone interested on new snapdragon chipset with windows on arm via llama.cpp
## Hardware
- Snapdragon X2 Elite Extreme (X2E94100, Qualcomm Oryon Gen 3)
- 18 cpu cores
- 48 GB Unified Memory
- ~228 GB/s peak memory bandwidth
- Adreno GPU (unused)
- Decent Hexagon NPU (unused)
- ISA features reported: NEON, FMA, DOTPROD, I8MM, SVE/SVE2, SME/SME2, fp16
- 4096-bit Matrix Engine (SME2) — present in hardware
i couldnt get KleidiAI (SME2) to work (guessing windows problem?)
llama.cpp does recognize and try to use the adreno gpu, but everything ive tried get adreno gpu to 100% but never see output. So all tests below are CPU only with the unified memory
been using Q5 qwen3.6 in opencode and its actually pretty usable! not the fastest but its great fun to be able to run it locally, even on battery it chugs along no problem. been impressed with this laptop so far
next project is getting whisper model running on 100% NPU (qlcom has some literature on this, hopefully works nice so i can dictate to CC and opencode on low power draw)
### Q4_K_M comparison across architectures | Model | Architecture | Size | Active | PP512 | TG128 | |---|---|---:|---|---:|---:| | Qwen3-4B | dense | 2.32 GiB | 4B | 248 t/s | 42 t/s | | Gemma-4-31B-it | dense | 18.24 GiB | 31B | 39 t/s | **6.5 t/s** | | Gemma-4-26B-A4B-it | MoE | 15.63 GiB | ~4B | 168 t/s | 31 t/s | | Qwen3.6-35B-A3B | MoE | 19.91 GiB | ~3B | 171 t/s | 33 t/s | ### Qwen3.6-35B-A3B quant + runtime config comparison | Quant | Size | KV config | PP512 | TG128 | |---|---:|---|---:|---:| | Q4_K_M | 19.91 GiB | fp16, no FA | 171 | 33.0 | | Q5_K_M | 23.29 GiB | fp16, no FA | 153 | 30.4 | | **Q5_K_M** | **23.29 GiB** | **q8_0 KV + FA (opencode)** | **145** | **29.6** | [link] [comments]




