| Big thanks to jukofyork and AesSedai today giving me some tips to patch and quantize the "full size" Kimi-K2.6 "Q4_X". It runs on both ik and mainline llama.cpp if you have over ~584GB RAM+VRAM... I'll follow up with imatrix for anyone else making custom quants, and some smaller quants that run on ik_llama.cpp soon. AesSedai will likely have mainline MoE optimized recipes up soon too! Cheers and curious how this big one compares with GLM-5.1. [link] [comments] |
ubergarm/Kimi-K2.6-GGUF Q4_X now available
Reddit r/LocalLLaMA / 4/21/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The ubergarm/Kimi-K2.6 GGUF Q4_X model is now available, with community guidance on patching and quantizing the “full size” Kimi-K2.6 Q4_X variant.
- The Q4_X build is reported to run on both ik and mainline llama.cpp, but it requires substantial system memory and/or VRAM (around 584GB+).
- The author plans follow-up work with imatrix for custom quantizations and additional smaller quant variants that can run on ik_llama.cpp.
- AesSedai is expected to contribute mainline MoE-optimized recipes soon, and the post invites comparison with GLM-5.1.
Related Articles

Black Hat USA
AI Business

Black Hat Asia
AI Business

Adobe Just Made MCP an Enterprise Procurement Line Item
Dev.to
Explainable Causal Reinforcement Learning for precision oncology clinical workflows in hybrid quantum-classical pipelines
Dev.to

AI Photo Captions for Instagram: Stop Staring at the Blank Box
Dev.to