| Hey r/LocalLLaMA, we did an investigation into MiniMax-M2.7 GGUF causing NaNs on perplexity. Our findings show the issue affects 21%-38% of all GGUFs on Hugging Face (not just ours).
Which quants did we test?
Also, CUDA 13.2 is still definitely an issue. This causes some low bit quants on all models to get gibberish. Some people have dismissed it as not being an issue, but from what we’ve seen, more than 50 people have now confirmed that using CUDA 13.1 and lower fixes it. You can also see some of the public comments in our Hugging Face discussions, Reddit posts etc. NVIDIA has acknowledged that they are investigating the issue - see Unsloth Issue 4849, llama.cpp issue 21255, issue 21371 If you have any questions please do ask and thank you again for all the support as always. Appreciate it and hope you have a lovely week. [link] [comments] |
MiniMax M2.7 GGUF Investigation, Fixes, Benchmarks
Reddit r/LocalLLaMA / 4/15/2026
💬 OpinionTools & Practical UsageModels & Research
Key Points
- An investigation found MiniMax-M2.7 GGUF files can produce NaNs during Perplexity evaluation, affecting an estimated 21%–38% of GGUF uploads on Hugging Face.
- The issue was traced to overflowing behavior in llama.cpp, with NaNs showing up at specific evaluation blocks (notably block 32, and sometimes block 311).
- The root trigger was identified as `blk.61.ffn_down_exps`, where particular quantization variants (e.g., Q4_K and Q5_K families) cause NaNs starting at chunk 32 during PPL evals.
- The authors updated the M2.7 GGUF quant sets on Hugging Face (unsloth/MiniMax-M2.7-GGUF) to alleviate the NaN problem, though they still cannot confirm the exact underlying cause of the perplexity NaNs.
- Benchmarks using higher-threshold metrics like 99.9% KLD indicated that many quality metrics remain fine even though Perplexity can fail for affected quant types.
- categories: Array of category slugs.



