One year ago DeepSeek R1 was 25 times bigger than Gemma 4

Reddit r/LocalLLaMA / 4/5/2026

💬 OpinionSignals & Early TrendsModels & Research

Key Points

  • The post compares DeepSeek R1’s reported 671B-parameter MoE model from about a year ago with today’s Gemma 4 MoE at about 26B parameters.
  • It highlights that Gemma 4 is approximately 25x smaller than DeepSeek R1, raising the question of whether the smaller model is correspondingly worse in quality.
  • The author expresses excitement about progress in local LLMs and implies rapid improvements in efficiency and capability.
  • Overall, the content frames the development as an encouraging signal for the future feasibility of running stronger models locally.

I'm mind blown by the fact that about a year ago DeepSeek R1 came out with a MoE architecture at 671B parameters and today Gemma 4 MoE is only 26B and is genuinely impressive. It's 25 times smaller, but is it 25 times worse?

I'm exited about the future of local LLMs.

submitted by /u/rinaldo23
[link] [comments]