Request: Training a pretrained, MoE version of Mistral Nemo

Reddit r/LocalLLaMA / 2026/3/24

💬 オピニオンSignals & Early TrendsModels & Research

共有:

要点

A Reddit user reports converting the dense “Mistral Nemo” model into a 16-expert Mixture-of-Experts (MoE) version due to budget-limited constraints for full fine-tuning.

I converted Mistral Nemo from a dense model into a sixteen expert MoE model: https://huggingface.co/blascotobasco/Mistral-NeMoE-12B-16E

The core problem is that I am a student with budget constraints and can’t afford full parameter or extended fine tuning. I did my best to restore coherence, and it worked, but the model currently gets a lot of things wrong and ignores instructions half the time.

I can’t offer anything for it but I hope someone takes interest in this model, I worked pretty hard on it but I am kinda hit the limit of what I can do with my budget and a rental GPU. The cool part is that if someone releases a trained version, I can expand the expert pool and release a version with expanded parameter capacity (it would have the same capabilities as the source model before training.)

submitted by /u/Destroy-My-Asshole
[link] [comments]

光電融合の製造受託に野心、新光電気「TSMCにはない魅力を」

日経XTECH

日立製作所と日立エナジー、エネルギーインフラ向けAIサービスを提供

日経XTECH

マイクロソフト、Claude CodeやGitHub Copilotに「このアプリをデプロイせよ」と指示すればAIが最適なインフラ構成やサービスでデプロイしてくれる「Azure Skills Plugin」公開

Publickey

[野球の予測モデル] 次の1球で何が起こるのかを予測したい

Qiita

なんと397BのAIモデルをiPhoneで動かすことに成功

GIGAZINE

Request: Training a pretrained, MoE version of Mistral Nemo

要点

関連記事

光電融合の製造受託に野心、新光電気「TSMCにはない魅力を」

日立製作所と日立エナジー、エネルギーインフラ向けAIサービスを提供

マイクロソフト、Claude CodeやGitHub Copilotに「このアプリをデプロイせよ」と指示すればAIが最適なインフラ構成やサービスでデプロイしてくれる「Azure Skills Plugin」公開

[野球の予測モデル] 次の1球で何が起こるのかを予測したい

なんと397BのAIモデルをiPhoneで動かすことに成功

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer