Request: Training a pretrained, MoE version of Mistral Nemo

Reddit r/LocalLLaMA / 2026/3/24

💬 オピニオンSignals & Early TrendsModels & Research

要点

  • A Reddit user reports converting the dense “Mistral Nemo” model into a 16-expert Mixture-of-Experts (MoE) version due to budget-limited constraints for full fine-tuning.

I converted Mistral Nemo from a dense model into a sixteen expert MoE model: https://huggingface.co/blascotobasco/Mistral-NeMoE-12B-16E

The core problem is that I am a student with budget constraints and can’t afford full parameter or extended fine tuning. I did my best to restore coherence, and it worked, but the model currently gets a lot of things wrong and ignores instructions half the time.

I can’t offer anything for it but I hope someone takes interest in this model, I worked pretty hard on it but I am kinda hit the limit of what I can do with my budget and a rental GPU. The cool part is that if someone releases a trained version, I can expand the expert pool and release a version with expanded parameter capacity (it would have the same capabilities as the source model before training.)

submitted by /u/Destroy-My-Asshole
[link] [comments]