Maybe it be helpful for someone:
llama-server -m '/Qwen3.6-27B/Qwen3.6-27B-IQ4_XS.gguf' -ngl 999 -ctk q4_0 -ctv q4_0 -b 128 -ub 128 -c 24000
Cant run this model with higher kv quants on >8192ctx size.
-ub & -b setted for 256 allowed me for max 16384 ctx
The max sized for ctx i get is 24k. Disabled gnome let me use additional 300MiB.
Its kinda nice, but ik that is very low usefull in many case.
This GPU load 63/65 layers in this quants without quant context. But its still q4 so i think that is good enough.
I used unsloth quant: https://huggingface.co/unsloth/Qwen3.6-27B-GGUF?show_file_info=Qwen3.6-27B-IQ4_XS.gguf
[link] [comments]



