What speed is everyone getting on Qwen3.6 27b?

Reddit r/LocalLLaMA / 4/23/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Read original →

共有:

Key Points

A Reddit user reports achieving about 13 tokens per second on Qwen3.6 27B using the Q8_0 quantization with a 128k context window while running via llama.cpp.
The setup described uses three GPUs (1× RTX 2060 Super 8GB and 2× RTX 5060 Ti 16GB) and specific llama-server launch parameters including K/V cache set to Q8_0.
The user shares their configuration details (temperature, top-p/top-k, penalties, and cache settings) and asks whether this throughput is slower than expected.
They note that the --fit-target value of 1536 was chosen to leave room for the model’s vision capability to function.
Overall, the post is a community performance benchmark/feedback request focused on local LLM inference speed expectations for Qwen3.6 27B.

I'm getting ~13 tps on Q8_0, with a context window of 128000, K Q8_0, V Q8_0

this is on 3x GPUS (1x2060super 8gb, 2x5060ti 16gb), via llamacpp

unsure if this is slow or to be expected?

*/llama-server --port 8080 --model */llama.cpp/Qwen3.6-27B-Q8_0/Qwen3.6-27B-Q8_0.gguf -mm */Qwen3.6-27B-Q8_0/mmproj-BF16.gguf -np 1 --temperature 1.0 --top-p 0.95 --top-k 20 --min-p 0.0 --presence-penalty 1.5 --repeat-penalty 1.0 --chat-template-kwargs '{"preserve_thinking": true}' --cache-type-k q8_0 --cache-type-v q8_0 -c 128000 --fit-target 1536

(--fit-target 1536 was to allow some space for the vision capability to work)

submitted by /u/Ambitious_Fold_2874
[link] [comments]

Black Hat USA

AI Business

Why Your Brand Is Invisible to ChatGPT (And How to Fix It)

Dev.to

No Free Lunch Theorem — Deep Dive + Problem: Reverse Bits

Dev.to

Salesforce Headless 360: Run Your CRM Without a Browser

Dev.to

RAG Systems in Production: Building Enterprise Knowledge Search

Dev.to

What speed is everyone getting on Qwen3.6 27b?

Key Points

Related Articles

Black Hat USA

Why Your Brand Is Invisible to ChatGPT (And How to Fix It)

No Free Lunch Theorem — Deep Dive + Problem: Reverse Bits

Salesforce Headless 360: Run Your CRM Without a Browser

RAG Systems in Production: Building Enterprise Knowledge Search

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer