[R] Structure Over Scale: Memory-First Reasoning and Depth-Pruned Efficiency in Magnus and Seed Architecture Auto-Discovery

Reddit r/MachineLearning / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The article reports a small experiment on “Seed” architecture auto-discovery, focusing on finding memory-first, smaller reasoning-capable models rather than scaling parameter counts.
  • Across four intent datasets (Banking77, CLINC150, HWU64, MASSIVE), the “Dynamic Seed Distill” approach often achieves competitive accuracy with roughly 4–5× fewer parameters than a larger “Logistic TF-IDF” baseline.
  • On Banking77, a distilled dynamic seed model reaches higher accuracy than the baseline while using far fewer parameters (~12.6k vs ~64.9k), highlighting the efficiency potential of structure-first search.
  • For CLINC150 and HWU64, the results are mixed—dynamic/dynamic-distilled seeds are smaller and faster in inference time but do not always outperform the strongest baseline accuracy.
  • The author’s overarching takeaway is that automated structure search (Seed) can identify the smallest architecture that still performs well, contrasting with traditional “scale up and hope” strategies.
Dataset Model Acc F1 Δ vs Log Δ vs Static Avg Params Peak Params Steps Infer ms Size
Banking77-20 Logistic TF-IDF 92.37% 0.9230 +0.00pp +0.76pp 64,940 64,940 0.00M 0.473 1.000x
Static Seed 91.61% 0.9164 -0.76pp +0.00pp 52,052 52,052 94.56M 0.264 0.801x
Dynamic Seed Distill 93.53% 0.9357 +1.17pp +1.92pp 12,648 16,881 70.46M 0.232 0.195x
CLINC150 | Logistic TF-IDF | 97.00% | 0.9701 | +0.00pp | +1.78pp | 41,020 | 41,020 | 0.00M | 0.000 | 1.000x | Static Seed | 95.22% | 0.9521 | -1.78pp | +0.00pp | 52,052 | 52,052 | 66.80M | 0.302 | 1.269x | Dynamic Seed | 94.78% | 0.9485 | -2.22pp | -0.44pp | 10,092 | 10,136 | 28.41M | 0.324 | 0.246x | Dynamic Seed Distill | 95.44% | 0.9544 | -1.56pp | +0.22pp | 9,956 | 9,956 | 32.69M | 0.255 | 0.243x HWU64 | Logistic TF-IDF | 87.94% | 0.8725 | +0.00pp | +0.81pp | 42,260 | 42,260 | 0.00M | 0.000 | 1.000x | Static Seed | 87.13% | 0.8674 | -0.81pp | +0.00pp | 52,052 | 52,052 | 146.61M | 0.300 | 1.232x | Dynamic Seed | 86.63% | 0.8595 | -1.31pp | -0.50pp | 12,573 | 17,565 | 62.54M | 0.334 | 0.297x | Dynamic Seed Distill | 87.23% | 0.8686 | -0.71pp | +0.10pp | 13,117 | 17,575 | 62.86M | 0.340 | 0.310x MASSIVE-20 | Logistic TF-IDF | 86.06% | 0.7324 | +0.00pp | -1.92pp | 74,760 | 74,760 | 0.00M | 0.000 | 1.000x | Static Seed | 87.98% | 0.8411 | +1.92pp | +0.00pp | 52,052 | 52,052 | 129.26M | 0.247 | 0.696x | Dynamic Seed | 86.94% | 0.7364 | +0.88pp | -1.04pp | 11,595 | 17,565 | 47.62M | 0.257 | 0.155x | Dynamic Seed Distill | 86.45% | 0.7380 | +0.39pp | -1.53pp | 11,851 | 19,263 | 51.90M | 0.442 | 0.159x 

Built a small experiment around Seed (architecture discovery)

Tested across 4 intent datasets:

Banking77
CLINC150
HWU64
MASSIVE

Results surprised me.

On Banking77:

Logistic TF-IDF: 92.37%
Dynamic Seed (distilled): 93.53%

At ~5x smaller (12.6k vs 64.9k params)

Across the others:

CLINC150 / HWU64 → not always higher accuracy
but ~4–5x smaller models with competitive performance
MASSIVE → quality + size wins consistently
Key pattern:

Dynamic Seed finds much smaller architectures
that stay competitive — and sometimes outperform strong baselines

This isn’t about bigger models.
It’s about:
finding the smallest model that still wins

Traditional approach:
scale size → hope for gains

Seed:
search structure → compress intelligently

Some takeaways:
Static models often lose

Dynamic discovery consistently improves efficiency
Distillation helps stabilize small models

Structure matters more than uniform scaling

This is the direction behind Seed AutoArch:
automatically discovering efficient models for real tasks
Not AGI
Not “we solved NLU”
But a real signal that:

structure > scale

What you guys make of this?

submitted by /u/califalcon
[link] [comments]