[R] Empirical evidence for a primitive layer in small language models — 18 experiments across 4 architectures

Reddit r/MachineLearning / 3/17/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The article reports 18 experiments probing small language models (360M–1B parameters) across four architectures (Qwen 2.5, Gemma 3, LLaMA 3.2, SmolLM2) to test for a primitive-layer structure in model representations.
It identifies a consistent activation gap between Layer 0a (scaffolding primitives: SOMEONE, TIME, PLACE) and Layer 0b (content primitives: FEAR, GRIEF, JOY, ANGER), averaging +0.245 across models and persisting in all four architectures.
Eleven pre-registered primitive compositions (operator + seed) matched predicted Layer 1 concepts in 3 of 4 models (e.g., WANT + GRIEF → longing/yearning; TIME + NOSTALGIA → memory/reminiscence; FEEL + GRIEF → heartbreak/sorrow).
The scaling pattern shows the gap is largest in the smallest model and narrows with size, not because content primitives weaken but because larger models gain phenomenological access to scaffolding primitives, possibly contributing to capability jumps at scale.
Limitations include small n per primitive, a classifier that is the same class of model being measured (circularity), and an open mechanistic explanation; the authors present preliminary findings and provide reproducible code and data via Ollama, with links to the paper and repo.

We ran 18 experiments probing small language models (360M–1B parameters) with inputs ranging from random phonemes to Wierzbicka's universal semantic primitives.

The main finding: a consistent activation gap exists between what we term Layer 0a (scaffolding primitives: SOMEONE, TIME, PLACE) and Layer 0b (content primitives: FEAR, GRIEF, JOY, ANGER). The gap averaged +0.245 across all four tested architectures (Qwen 2.5, Gemma 3, LLaMA 3.2, SmolLM2) and was directionally consistent in every model.

Additionally, 11 pre-registered primitive compositions (operator + seed) matched predicted Layer 1 concepts in 3/4 models — e.g. WANT + GRIEF → longing/yearning, TIME + NOSTALGIA → memory/reminiscence, FEEL + GRIEF → heartbreak/sorrow.

The scaling pattern is the finding we're most uncertain about but find most interesting: the gap is largest in the smallest model and narrows as scale increases — not because content

primitives weaken but because larger models develop phenomenological access to scaffolding primitives too. This may partly explain capability jumps at scale.

All experiments are reproducible locally via Ollama. No API keys required. Code and data in the repo.

Paper: https://github.com/dchisholm125/graph-oriented-generation/blob/main/SRM_PAPER.md

Repo: https://github.com/dchisholm125/graph-oriented-generation

Limitations we're aware of: small n per primitive, the classifier is the same class of model being measured (circularity), and the mechanistic explanation is completely open. We're publishing preliminary findings, not definitive claims.

submitted by /u/BodeMan5280
[link] [comments]

Astral to Join OpenAI

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

Why Data is Important for LLM

Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

Dev.to

[R] Empirical evidence for a primitive layer in small language models — 18 experiments across 4 architectures

Key Points

Related Articles

Astral to Join OpenAI

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Why Data is Important for LLM

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer