"The Child That Surpassed Both Parents" Darwin-35B-A3B-Opus (35B/3B MoE) with Model MRI Technique

Reddit r/LocalLLaMA / 4/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • Darwin-35B-A3B-Opus is a 35B Mixture-of-Experts model (with only ~3B active parameters) created by SeaWolf-AI/VIDRAFT_LAB using their Darwin V5 model-merging engine.
  • The team used a layer-by-layer “Model MRI/CT-scan” method to identify which components from two parent models contribute most effectively to reasoning performance.
  • The merge strategy selectively transplants the distilled Claude 4.6 Opus reasoning layers (notably L34–L38) while swapping in Qwen3.5-35B-A3B’s “healthy experts,” with the father’s router driving the outputs.
  • Reported benchmark improvements include GPQA Diamond rising to 90.0% versus 84.2% (father) and 85.0% (mother), with MMMLU roughly matching the father (~85%), and multimodal + multilingual performance largely preserved.
  • The model is claimed to be fast (~148 tok/s on H100, runs on a single RTX 4090 in Q4), is fully open under Apache 2.0, and the authors plan to release the full Darwin V5 algorithm and paper soon.

Darwin-35B-A3B-Opus is a 35B MoE model (only 3B parameters active) created by SeaWolf-AI / VIDRAFT_LAB using their new Darwin V5 merging engine.

They built a system that does a deep "CT-scan" (Model MRI) of the parent models layer by layer to figure out what actually works.

Father: Qwen3.5-35B-A3B (strong generalist)

Mother: Claude 4.6 Opus distilled (strong reasoning but apparently had a lot of "dead experts" after distillation)

The merge strategy: transplant the mother's strong reasoning layers (especially L34–L38), swap in the father's healthy experts, and let the father's router handle the output.

Reported results:

GPQA Diamond: 90.0% 🔥

→ Father: 84.2%

→ Mother: 85.0%

→ That's a solid +5.8–5.9% jump with no major trade-offs

MMMLU: 85.0% (basically the same as Father at 85.2%)

Fully preserves multimodal (image + video) and 201 languages

262K native context

Blazing fast: ~148 tok/s on H100, and it runs on a single RTX 4090 in Q4

License: Apache 2.0 — fully open.

They call it "the child that surpassed both parents" and plan to release the full Darwin V5 algorithm + paper soon.

Model page: https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus

submitted by /u/Own-Potential-2308
[link] [comments]