[R] Structure Over Scale: Memory-First Reasoning and Depth-Pruned Efficiency in Magnus and Seed Architecture Auto-Discovery

Reddit r/MachineLearning / 3/31/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The article reports a small experiment on “Seed” architecture auto-discovery, focusing on finding memory-first, smaller reasoning-capable models rather than scaling parameter counts.
Across four intent datasets (Banking77, CLINC150, HWU64, MASSIVE), the “Dynamic Seed Distill” approach often achieves competitive accuracy with roughly 4–5× fewer parameters than a larger “Logistic TF-IDF” baseline.
On Banking77, a distilled dynamic seed model reaches higher accuracy than the baseline while using far fewer parameters (~12.6k vs ~64.9k), highlighting the efficiency potential of structure-first search.
For CLINC150 and HWU64, the results are mixed—dynamic/dynamic-distilled seeds are smaller and faster in inference time but do not always outperform the strongest baseline accuracy.
The author’s overarching takeaway is that automated structure search (Seed) can identify the smallest architecture that still performs well, contrasting with traditional “scale up and hope” strategies.

Dataset	Model	Acc	F1	Δ vs Log	Δ vs Static	Avg Params	Peak Params	Steps	Infer ms	Size
Banking77-20	Logistic TF-IDF	92.37%	0.9230	+0.00pp	+0.76pp	64,940	64,940	0.00M	0.473	1.000x
	Static Seed	91.61%	0.9164	-0.76pp	+0.00pp	52,052	52,052	94.56M	0.264	0.801x
	Dynamic Seed Distill	93.53%	0.9357	+1.17pp	+1.92pp	12,648	16,881	70.46M	0.232	0.195x

CLINC150 | Logistic TF-IDF | 97.00% | 0.9701 | +0.00pp | +1.78pp | 41,020 | 41,020 | 0.00M | 0.000 | 1.000x | Static Seed | 95.22% | 0.9521 | -1.78pp | +0.00pp | 52,052 | 52,052 | 66.80M | 0.302 | 1.269x | Dynamic Seed | 94.78% | 0.9485 | -2.22pp | -0.44pp | 10,092 | 10,136 | 28.41M | 0.324 | 0.246x | Dynamic Seed Distill | 95.44% | 0.9544 | -1.56pp | +0.22pp | 9,956 | 9,956 | 32.69M | 0.255 | 0.243x HWU64 | Logistic TF-IDF | 87.94% | 0.8725 | +0.00pp | +0.81pp | 42,260 | 42,260 | 0.00M | 0.000 | 1.000x | Static Seed | 87.13% | 0.8674 | -0.81pp | +0.00pp | 52,052 | 52,052 | 146.61M | 0.300 | 1.232x | Dynamic Seed | 86.63% | 0.8595 | -1.31pp | -0.50pp | 12,573 | 17,565 | 62.54M | 0.334 | 0.297x | Dynamic Seed Distill | 87.23% | 0.8686 | -0.71pp | +0.10pp | 13,117 | 17,575 | 62.86M | 0.340 | 0.310x MASSIVE-20 | Logistic TF-IDF | 86.06% | 0.7324 | +0.00pp | -1.92pp | 74,760 | 74,760 | 0.00M | 0.000 | 1.000x | Static Seed | 87.98% | 0.8411 | +1.92pp | +0.00pp | 52,052 | 52,052 | 129.26M | 0.247 | 0.696x | Dynamic Seed | 86.94% | 0.7364 | +0.88pp | -1.04pp | 11,595 | 17,565 | 47.62M | 0.257 | 0.155x | Dynamic Seed Distill | 86.45% | 0.7380 | +0.39pp | -1.53pp | 11,851 | 19,263 | 51.90M | 0.442 | 0.159x

Built a small experiment around Seed (architecture discovery)

Tested across 4 intent datasets:

Banking77
CLINC150
HWU64
MASSIVE

Results surprised me.

On Banking77:

Logistic TF-IDF: 92.37%
Dynamic Seed (distilled): 93.53%

At ~5x smaller (12.6k vs 64.9k params)

Across the others:

CLINC150 / HWU64 → not always higher accuracy
but ~4–5x smaller models with competitive performance
MASSIVE → quality + size wins consistently
Key pattern:

Dynamic Seed finds much smaller architectures
that stay competitive — and sometimes outperform strong baselines

This isn’t about bigger models.
It’s about:
finding the smallest model that still wins

Traditional approach:
scale size → hope for gains

Seed:
search structure → compress intelligently

Some takeaways:
Static models often lose

Dynamic discovery consistently improves efficiency
Distillation helps stabilize small models

Structure matters more than uniform scaling

This is the direction behind Seed AutoArch:
automatically discovering efficient models for real tasks
Not AGI
Not “we solved NLU”
But a real signal that:

structure > scale

What you guys make of this?

submitted by /u/califalcon
[link] [comments]