Just quantized MiniMax-M2.7 (229B MoE) — first GGUF quants available on HuggingFace.
Files:
- Q3_K_L (~110 GB) — fits 128GB unified memory
- Q8_0 (~243 GB) — for 256GB+ setups
https://huggingface.co/ox-ox/MiniMax-M2.7-GGUF
PPL benchmark running now (c=512, seed=1337) — will update with results.
Baseline from M2.5 Q3_K_L: 8.7948 PPL, 28.7 t/s
Architecture: MiniMax-M2 MoE, 256 experts, 8 active/token.
Source: FP8 safetensors → Q8_0 → Q3_K_L via llama.cpp.
[link] [comments]




