Preservation Is Not Enough for Width Growth: Regime-Sensitive Selection of Dense LM Warm Starts

arXiv cs.AI / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies how to choose effective widened warm-start states for dense language-model width growth, showing that simply preserving the zero-step behavior is insufficient for selecting the best candidate.

Abstract

Width expansion offers a practical route to reuse smaller causal-language-model checkpoints, but selecting a widened warm start is not solved by zero-step preservation alone. We study dense width growth as a candidate-selection problem over full training states, including copied weights, optimizer moments, and scheduler state. In a small-scale TinyStories proxy, we compare exact-copy, perturbative, asymmetric-reset, and structured non-clone warm starts under matched continuation budgets. We evaluate zero-step preservation, short-lag probe metrics, and downstream continuation utility in deterministic and stochastic regimes. The picture is mixed and partially replicated through a reduced-pool seed-1 check. Exact-copy symmetric warm starts rank first in every completed 16-step probe and in the completed stochastic 128-step continuations at seed-0 steps 1000 and 2000 plus reduced seed-1 step 2000. By contrast, the structured non-clone challenger wins deterministic 128-step continuation. Early escape from the inherited cloned subspace is therefore not a universal selector: it helps in long deterministic continuation, but it misleads at short lag and under stochastic continuation. The result is narrow but useful: for dense width growth at this scale, preservation is not a universal ranking criterion, and the best replacement signal depends on both regime and lag budget.

Black Hat Asia

AI Business

Fully Automated Website 2026-04-11: The Scoreboard — Visual Judge Score Comparison on the Homepage

Dev.to

Human-Aligned Decision Transformers for satellite anomaly response operations with ethical auditability baked in

Dev.to

That Smoking-Gun Video? It's Not Evidence. It's a Suspect.

Dev.to

AI Citation Registries and Website-Based Publishing Constraints

Dev.to

Preservation Is Not Enough for Width Growth: Regime-Sensitive Selection of Dense LM Warm Starts

Key Points

Abstract

Related Articles

Black Hat Asia

Fully Automated Website 2026-04-11: The Scoreboard — Visual Judge Score Comparison on the Homepage

Human-Aligned Decision Transformers for satellite anomaly response operations with ethical auditability baked in

That Smoking-Gun Video? It's Not Evidence. It's a Suspect.

AI Citation Registries and Website-Based Publishing Constraints

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

Black Hat Asia

Fully Automated Website 2026-04-11: **The Scoreboard — Visual Judge Score Comparison on the Homepage**

Human-Aligned Decision Transformers for satellite anomaly response operations with ethical auditability baked in

That Smoking-Gun Video? It's Not Evidence. It's a Suspect.

AI Citation Registries and Website-Based Publishing Constraints

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Fully Automated Website 2026-04-11: The Scoreboard — Visual Judge Score Comparison on the Homepage