I found a TQ3-quantized version of Qwen3-Coder-Next here:
https://huggingface.co/edwardyoon79/Qwen3-Coder-Next-TQ3_0
According to the page, this model requires a compatible inference engine that supports TurboQuant. It also provides a command, but it doesn’t clearly specify which version or fork of llama.cpp should be used (or maybe I missed it).llama-server
I’ve tried the following llama.cpp forks that claim to support TQ3, but none of them worked for me:
- https://github.com/TheTom/llama-cpp-turboquant
- https://github.com/turbo-tan/llama.cpp-tq3
- https://github.com/drdotdot/llama.cpp-turbo3-tq3
If anyone has successfully run this model, I’d really appreciate it if you could share how you did it.
[link] [comments]



