AI Navigate

Qwen3.5-27B-IQ3_M, 5070ti 16GB, 32k context: ~50t/s

Reddit r/LocalLLaMA / 3/12/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The post reports that Qwen3.5-27B runs locally on a 5070ti 16GB card with 32k context and achieves unexpectedly high prompt throughput.
  • The results build on the willbnu/Qwen-3.5-16G-Vram-Local repo but require a specific locked profile and configuration to reproduce.
  • Benchmark numbers show prompt throughput around 462.7–478.3 t/s and generation around 48 t/s across configurations, highlighting strong prompt speed.
  • The final setup uses Qwen3.5-27B-Uncensored-HauhauCS-Aggressive-IQ3_M.gguf with 32,768 context, 99 GPU layers, iq4_nl caches, batch 1024 / 512, 6 threads, ctx-checkpoints 0, flash attention on, and port 8004, illustrating how to replicate.

I wanted to share this one with the community, as i was surprised I got it working, and that its as performant as it is. IQ3 is generally really really bad on any model... but ive found that not to be the case on Qwen3.5 since the 27B is just so capable.

My starting point was this: https://github.com/willbnu/Qwen-3.5-16G-Vram-Local but I wasnt able to fully reproduce the results seen until i configured as below.

Benchmark comparison - Baseline (ctx-checkpoints=8, Q3_K_S): prompt ≈ 185.8 t/s, gen ≈ 48.3 t/s — qwen-guide/benchmark_port8004_20260311_233216.json

  • ctx-checkpoints=0 (same model): prompt ≈ 478.3 t/s, gen ≈ 48.7 t/s — qwen-guide/benchmark_port8004_20260312_000246.json

  • Hauhau IQ3_M locked profile (port 8004): prompt ≈ 462.7 t/s, gen ≈ 48.4 t/s — qwen-guide/benchmark_port8004_20260312_003521.json

Final locked profile parameters - Model: Qwen3.5-27B-Uncensored-HauhauCS-Aggressive-IQ3_M.gguf - Context: 32,768 - GPU layers: 99 (all 65 layers on GPU) - KV cache types: K=iq4_nl, V=iq4_nl - Batch / UBatch: 1024 / 512 - Threads: 6 - ctx-checkpoints: 0 - Reasoning budget: 0 - Parallel: 1 - Flash attention: on - Launcher script: scripts/start_quality_locked.sh - Port: 8004

submitted by /u/ailee43
[link] [comments]