| Hi, AesSedai here - I've put up a PR to support the text-to-text inference of MiMo V2.5 with llama.cpp (and should also support Pro, will work on those quants after finishing V2.5): https://github.com/ggml-org/llama.cpp/pull/22493 I've also put some quants up on HF (https://huggingface.co/AesSedai/MiMo-V2.5-GGUF), the Q8_0 as well as my usual MoE-optimized quants (for those unfamiliar, it's basically Q8_0 or Q6_K for most of the model, and quanting the FFNs down). There is a weird NAN issue with the Q4_K_M that I'm looking into, I believe it's the ffn_down_exps tensor on layer 47 (edit: fixed the NAN issue, uploading the working Q4_K_M now!) Bartowski, Ubergarm, Unsloth, and the rest of our lovely llama quanting cartel should be following up with their own quants in the near future. Since this is pre-merge though, there might be some changes but hopefully this PR gets reviewed and merged soon. Please let me know if there are any issues. [link] [comments] |
MiMo-V2.5-GGUF (preview available)
Reddit r/LocalLLaMA / 4/29/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- AesSedai has opened a PR to enable text-to-text inference of MiMo V2.5 using llama.cpp, with support expected to extend to related variants (e.g., Pro) after V2.5 work is completed.
- Quantized MiMo V2.5 GGUF models have been uploaded to Hugging Face, including Q8_0 and MiE/MoE-optimized quantizations that reduce FFN weights for efficiency.
- A NAN issue was identified in the Q4_K_M quantization (attributing it to the ffn_down_exps tensor in a specific layer), and a fix has been made with an updated working upload.
- The author notes that additional quantization efforts from other community maintainers are likely to follow, but the PR is still pre-merge and could change before being reviewed and merged.
- Users are encouraged to test the pre-merge builds and report any issues so the PR can be reviewed and merged quickly.
Related Articles

Black Hat USA
AI Business
LLMs will be a commodity
Reddit r/artificial

Indian Developers: How to Build AI Side Income with $0 Capital in 2026
Dev.to

HubSpot Just Legitimized AEO: What It Means for Your Brand AI Visibility
Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally
Reddit r/LocalLLaMA