MiniMax-M2.7 Q3_K_L & Q8_0 — First GGUF quants, Apple Silicon (M3 Max 128GB)

Reddit r/LocalLLaMA / 4/12/2026

📰 NewsSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

MiniMax-M2.7 (229B MoE) has been quantized into GGUF formats and made available on Hugging Face as the first GGUF quants for this model family.
Two variants were released: Q3_K_L (~110GB) targeting Apple Silicon with 128GB unified memory, and Q8_0 (~243GB) intended for 256GB+ setups.
The quantization pipeline converts FP8 safetensors into Q8_0 and then Q3_K_L using llama.cpp.
The post reports an in-progress PPL benchmark (c=512, seed=1337) and provides a baseline reference from the prior M2.5 Q3_K_L (PPL 8.7948, 28.7 t/s).
The model architecture is a MiniMax-M2 MoE with 256 experts and 8 active experts per token, which is relevant for expected performance and deployment tradeoffs across hardware tiers.

Just quantized MiniMax-M2.7 (229B MoE) — first GGUF quants available on HuggingFace.

Files:

- Q3_K_L (~110 GB) — fits 128GB unified memory

- Q8_0 (~243 GB) — for 256GB+ setups

PPL benchmark running now (c=512, seed=1337) — will update with results.

Baseline from M2.5 Q3_K_L: 8.7948 PPL, 28.7 t/s

Architecture: MiniMax-M2 MoE, 256 experts, 8 active/token.

Source: FP8 safetensors → Q8_0 → Q3_K_L via llama.cpp.