Turbo Quant on weight x2 speed

Reddit r/LocalLLaMA / 4/2/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • A new quantized model variant, TQ3_4S, is announced as part of “Turbo Quant,” claiming about 2x faster inference while maintaining the same model size compared with TQ3_1S.
  • The author reports that TQ3_4S delivers better quality than TQ3_1S, positioning it as an improvement for local LLM quantized deployments.
  • The article links to the Hugging Face model page for “Qwen3.5-27B-TQ3_4S,” making the artifact readily available for testing.
  • Despite the claimed improvements, the author notes that on median PPL, the reference model Q3_K_S still has a slight edge, and further tuning is planned for future releases.
Turbo Quant on weight x2 speed

https://preview.redd.it/hvkmfmp3mnsg1.png?width=1228&format=png&auto=webp&s=12e7bc31b08a734aec424b18ff17b4e517020ea6

Happy to announce TQ3_4S.
2x faster, better quality than TQ3_1S, same size.

https://huggingface.co/YTan2000/Qwen3.5-27B-TQ3_4S

Please note: on median PPL, Q3_K_S has slight edge.
My next model has beaten Q3_K_S on medial but need more tweaking

submitted by /u/Imaginary-Anywhere23
[link] [comments]