ubergarm/Kimi-K2.6-GGUF Q4_X now available

Reddit r/LocalLLaMA / 4/21/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The ubergarm/Kimi-K2.6 GGUF Q4_X model is now available, with community guidance on patching and quantizing the “full size” Kimi-K2.6 Q4_X variant.
  • The Q4_X build is reported to run on both ik and mainline llama.cpp, but it requires substantial system memory and/or VRAM (around 584GB+).
  • The author plans follow-up work with imatrix for custom quantizations and additional smaller quant variants that can run on ik_llama.cpp.
  • AesSedai is expected to contribute mainline MoE-optimized recipes soon, and the post invites comparison with GLM-5.1.
ubergarm/Kimi-K2.6-GGUF Q4_X now available

Big thanks to jukofyork and AesSedai today giving me some tips to patch and quantize the "full size" Kimi-K2.6 "Q4_X". It runs on both ik and mainline llama.cpp if you have over ~584GB RAM+VRAM...

I'll follow up with imatrix for anyone else making custom quants, and some smaller quants that run on ik_llama.cpp soon. AesSedai will likely have mainline MoE optimized recipes up soon too!

Cheers and curious how this big one compares with GLM-5.1.

submitted by /u/VoidAlchemy
[link] [comments]