MiMo-V2.5-GGUF (preview available)

Reddit r/LocalLLaMA / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

AesSedai has opened a PR to enable text-to-text inference of MiMo V2.5 using llama.cpp, with support expected to extend to related variants (e.g., Pro) after V2.5 work is completed.
Quantized MiMo V2.5 GGUF models have been uploaded to Hugging Face, including Q8_0 and MiE/MoE-optimized quantizations that reduce FFN weights for efficiency.
A NAN issue was identified in the Q4_K_M quantization (attributing it to the ffn_down_exps tensor in a specific layer), and a fix has been made with an updated working upload.
The author notes that additional quantization efforts from other community maintainers are likely to follow, but the PR is still pre-merge and could change before being reviewed and merged.
Users are encouraged to test the pre-merge builds and report any issues so the PR can be reviewed and merged quickly.

Hi, AesSedai here -

I've put up a PR to support the text-to-text inference of MiMo V2.5 with llama.cpp (and should also support Pro, will work on those quants after finishing V2.5): https://github.com/ggml-org/llama.cpp/pull/22493

I've also put some quants up on HF (https://huggingface.co/AesSedai/MiMo-V2.5-GGUF), the Q8_0 as well as my usual MoE-optimized quants (for those unfamiliar, it's basically Q8_0 or Q6_K for most of the model, and quanting the FFNs down). There is a weird NAN issue with the Q4_K_M that I'm looking into, I believe it's the ffn_down_exps tensor on layer 47 (edit: fixed the NAN issue, uploading the working Q4_K_M now!)

Bartowski, Ubergarm, Unsloth, and the rest of our lovely llama quanting cartel should be following up with their own quants in the near future.

Since this is pre-merge though, there might be some changes but hopefully this PR gets reviewed and merged soon. Please let me know if there are any issues.

submitted by /u/Digger412
[link] [comments]