Qwen3.6 27B seems struggling at 90k on 128k ctx windows

Reddit r/LocalLLaMA / 4/30/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A Reddit user reports using Qwen3.6 27B in GGUF (Q4_K_XL) on an RX 7900 XTX and observing excellent coding performance for prompts under 64k context.
When attempting more complex devops-related tasks that require tool calling at around 90k context within a 128k context window, the model reportedly fails to perform tool calling reliably.
The user ran the model via llama.cpp with a 128000 context length and specific sampling parameters (e.g., temp 0.6, top-p 0.95) and asks others about their experiences.
The post is primarily a troubleshooting/experience-sharing question about long-context behavior and limitations rather than an official benchmark or release.
Overall, the anecdotal evidence suggests long-context degradation and tool-calling instability at very high context lengths for this setup.

I have RX 7900 XTX, running Qwen3.6 27B Q4_K_XL. got 400ish pp and 30s tps. every work below 64k is incredible and it spits out good quality code.

But i tried to push it further to work on kinda complex devops related work and it fail at tool calling at 90k ctx.

I use opencode as my harness and here is the llama.cpp command i ran:

Ilama-server -ctv q8_0 -ctk q8_0 -c 128000 --temp 0.6 --top-p 0.95 --top-k 20 --repeat-penalty 1.0 --fit on.

what's your experience?

AI Business

LangChain Releases

Dev.to

Dev.to

Dev.to