AI Navigate

HELP - What settings do you use? Qwen3.5-35B-A3B

Reddit r/LocalLLaMA / 3/21/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • The post discusses configuring Qwen3.5-35B-A3B on a 16GB GPU using llama.cpp and asks for recommended settings and quantization size.
  • It includes a concrete llama-server command line with many flags to illustrate the current setup.
  • The author wonders whether a larger quantization size is possible and notes community interest in the model.
  • It is a practical, user-submitted inquiry on Reddit seeking setup guidance rather than a new AI release.

I have a 16GB 9070xt , what settings do you use and what quant size for Qwen3.5-35B-A3B?

I see every alot of people giving love to Qwen3.5-35B-A3B, but i feel like im setting it up incorrectly. Im using llama.cpp

Can i go up a size in quant?

cmd: C:\llamaROCM\llama-server.exe --port ${PORT} -m "C:\llamaROCM\models\Huihui-Qwen3.5-35B-A3B-abliterated.i1-IQ4_XS.gguf" -c 8192 -np 1 -ngl 99 -ncmoe 16 -fa on --temp 0.7 --top-k 20 --top-p 0.95 --min-p 0.00 --flash-attn on --cache-type-k f16 --cache-type-v f16 --threads 12 --context-shift --sleep-idle-seconds 300 -b 4096 -ub 2048 
submitted by /u/uber-linny
[link] [comments]