I wanted to share this one with the community, as i was surprised I got it working, and that its as performant as it is. IQ3 is generally really really bad on any model... but ive found that not to be the case on Qwen3.5 since the 27B is just so capable.
My starting point was this: https://github.com/willbnu/Qwen-3.5-16G-Vram-Local but I wasnt able to fully reproduce the results seen until i configured as below.
Benchmark comparison - Baseline (ctx-checkpoints=8, Q3_K_S): prompt ≈ 185.8 t/s, gen ≈ 48.3 t/s — qwen-guide/benchmark_port8004_20260311_233216.json
ctx-checkpoints=0 (same model): prompt ≈ 478.3 t/s, gen ≈ 48.7 t/s — qwen-guide/benchmark_port8004_20260312_000246.json
Hauhau IQ3_M locked profile (port 8004): prompt ≈ 462.7 t/s, gen ≈ 48.4 t/s — qwen-guide/benchmark_port8004_20260312_003521.json
Final locked profile parameters - Model: Qwen3.5-27B-Uncensored-HauhauCS-Aggressive-IQ3_M.gguf - Context: 32,768 - GPU layers: 99 (all 65 layers on GPU) - KV cache types: K=iq4_nl, V=iq4_nl - Batch / UBatch: 1024 / 512 - Threads: 6 - ctx-checkpoints: 0 - Reasoning budget: 0 - Parallel: 1 - Flash attention: on - Launcher script: scripts/start_quality_locked.sh - Port: 8004
[link] [comments]