"The Child That Surpassed Both Parents" Darwin-35B-A3B-Opus (35B/3B MoE) with Model MRI Technique

Reddit r/LocalLLaMA / 4/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

Darwin-35B-A3B-Opus is a 35B Mixture-of-Experts model (with only ~3B active parameters) created by SeaWolf-AI/VIDRAFT_LAB using their Darwin V5 model-merging engine.
The team used a layer-by-layer “Model MRI/CT-scan” method to identify which components from two parent models contribute most effectively to reasoning performance.
The merge strategy selectively transplants the distilled Claude 4.6 Opus reasoning layers (notably L34–L38) while swapping in Qwen3.5-35B-A3B’s “healthy experts,” with the father’s router driving the outputs.
Reported benchmark improvements include GPQA Diamond rising to 90.0% versus 84.2% (father) and 85.0% (mother), with MMMLU roughly matching the father (~85%), and multimodal + multilingual performance largely preserved.
The model is claimed to be fast (~148 tok/s on H100, runs on a single RTX 4090 in Q4), is fully open under Apache 2.0, and the authors plan to release the full Darwin V5 algorithm and paper soon.

Darwin-35B-A3B-Opus is a 35B MoE model (only 3B parameters active) created by SeaWolf-AI / VIDRAFT_LAB using their new Darwin V5 merging engine.

They built a system that does a deep "CT-scan" (Model MRI) of the parent models layer by layer to figure out what actually works.

Father: Qwen3.5-35B-A3B (strong generalist)

Mother: Claude 4.6 Opus distilled (strong reasoning but apparently had a lot of "dead experts" after distillation)

The merge strategy: transplant the mother's strong reasoning layers (especially L34–L38), swap in the father's healthy experts, and let the father's router handle the output.

Reported results:

GPQA Diamond: 90.0% 🔥

→ Father: 84.2%

→ Mother: 85.0%

→ That's a solid +5.8–5.9% jump with no major trade-offs

MMMLU: 85.0% (basically the same as Father at 85.2%)

Fully preserves multimodal (image + video) and 201 languages

262K native context

Blazing fast: ~148 tok/s on H100, and it runs on a single RTX 4090 in Q4

License: Apache 2.0 — fully open.

They call it "the child that surpassed both parents" and plan to release the full Darwin V5 algorithm + paper soon.

Model page: https://huggingface.co/FINAL-Bench/Darwin-35B-A3B-Opus