Qwen3-Coder-Next with llama.cpp shenanigans

Reddit r/LocalLLaMA / 3/14/2026

💬 OpinionTools & Practical UsageModels & Research

共有:

Key Points

The author reports that Qwen3-Coder-Next performs poorly with llama.cpp, frequently looping and failing to call tools, even after an autoparser merge.
They tested Claude code, Qwen code, OpenCode, and found the model to be non-performant across these options.
The post includes their exact llama-server command and quantization setup (UD-Q8_K_XL) and notes trying different quant methods after redownloading.
An edit mentions that switching to bartowski quant resolves the issues for them, suggesting quant method as a key factor.
The author asks others what setups work and invites discussion on how to make the model function reliably.

For the life of me I don't get how is Q3CN of any value for vibe coding, I see endless posts about the model's ability and it all strikes me very strange because I cannot get the same performance. The model loops like crazy, can't properly call tools, goes into wild workarounds to bypass the tools it should use. I'm using llama.cpp and this happened before and after the autoparser merge. The quant is unsloth's UD-Q8_K_XL, I've redownloaded after they did their quant method upgrade, but both models have the same problem.

I've tested with claude code, qwen code, opencode, etc... and the model is simply non performant in all of them.

Here's my command:

```bash

llama-server -m ~/.cache/hub/huggingface/hub/models--unsloth--Qwen3-Coder-Next-GGUF/snapshots/ce09c67b53bc8739eef83fe67b2f5d293c270632/UD-Q8_K_XL/Qwen3-Coder-Next-UD-Q8_K_XL-00001-of-00003.gguf --temp 0.8 --top-p 0.95 --min-p 0.01 --top-k 40 --batch-size 4096 --ubatch-size 1024 --dry-multiplier 0.5 --dry-allowed-length 5 --frequency_penalty 0.5 --presence-penalty 1.10

```

Is it just my setup? What are you guys doing to make this model work?

EDIT: as per this comment I'm now using bartowski quant without issues

submitted by /u/JayPSec
[link] [comments]