I have a 16GB 9070xt , what settings do you use and what quant size for Qwen3.5-35B-A3B?
I see every alot of people giving love to Qwen3.5-35B-A3B, but i feel like im setting it up incorrectly. Im using llama.cpp
Can i go up a size in quant?
cmd: C:\llamaROCM\llama-server.exe --port ${PORT} -m "C:\llamaROCM\models\Huihui-Qwen3.5-35B-A3B-abliterated.i1-IQ4_XS.gguf" -c 8192 -np 1 -ngl 99 -ncmoe 16 -fa on --temp 0.7 --top-k 20 --top-p 0.95 --min-p 0.00 --flash-attn on --cache-type-k f16 --cache-type-v f16 --threads 12 --context-shift --sleep-idle-seconds 300 -b 4096 -ub 2048 [link] [comments]

