QWEN3.6 + ik_llama is fast af

Reddit r/LocalLLaMA / 4/20/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • The post reports a local AI inference setup running Qwen3.6 (UD_Q_4_K_M) on a machine with 16GB VRAM and 32GB RAM.
  • It claims high throughput performance, generating 200k context (200k cw) at speeds over 50 tokens per second.
  • The message is presented as a Reddit user’s practical benchmark for running these models locally, emphasizing speed.
  • The content is focused on performance results rather than any new model release or official announcement.
QWEN3.6 + ik_llama is fast af

running qwen3.6 UD_Q_4_K_M on 16GB vram + 32GB ram with 200k cw @50+ tok/s

submitted by /u/_BigBackClock
[link] [comments]