Help running Qwen3-Coder-Next TurboQuant (TQ3) model

Reddit r/LocalLLaMA / 4/4/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

A user asks how to run a TurboQuant-quantized version of the Qwen3-Coder-Next model (TQ3), pointing to a specific Hugging Face model card that states it requires an inference engine supporting TurboQuant.
The model card includes a `llama-server` command but does not clearly indicate which compatible llama.cpp version or fork should be used.
The user tried multiple llama.cpp forks that claim TurboQuant/TQ3 support, but none worked in their setup.
They request guidance from anyone who has successfully run the model, implying there may be a missing configuration detail, incompatible build, or unclear engine requirement.

I found a TQ3-quantized version of Qwen3-Coder-Next here:
https://huggingface.co/edwardyoon79/Qwen3-Coder-Next-TQ3_0

According to the page, this model requires a compatible inference engine that supports TurboQuant. It also provides a command, but it doesn’t clearly specify which version or fork of llama.cpp should be used (or maybe I missed it).llama-server

I’ve tried the following llama.cpp forks that claim to support TQ3, but none of them worked for me:

If anyone has successfully run this model, I’d really appreciate it if you could share how you did it.

submitted by /u/UnluckyTeam3478
[link] [comments]