Neural Collapse Dynamics: Depth, Activation, Regularisation, and Feature Norm Threshold

arXiv cs.LG / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies how neural collapse (convergence of penultimate-layer features to a simplex equiangular tight frame) begins, moving beyond equilibrium-only understanding to characterize onset dynamics.
It proposes a predictive regularity that neural collapse occurs when the mean feature norm crosses a model–dataset-specific critical threshold f_{n}*, which is largely invariant to training conditions within each (model, dataset) pair.
Standard training trajectories show that the feature-norm threshold crossing consistently precedes neural collapse onset by a mean lead time of 62 epochs, enabling a practical timing predictor.
A gradient-flow intervention experiment indicates f_{n}* behaves as a stable attractor: perturbations to feature scale self-correct during training and converge back to the same threshold value.
Across an (architecture)×(dataset) grid, the strongest finding is a large architecture effect on MNIST (e.g., ResNet-20 yielding f_{n}* = 5.867) and strongly non-additive interactions, with additional phase-diagram behavior governed by weight decay and acceleration effects from width.

Abstract

Neural collapse (NC) -- the convergence of penultimate-layer features to a simplex equiangular tight frame -- is well understood at equilibrium, but the dynamics governing its onset remain poorly characterised. We identify a simple and predictive regularity: NC occurs when the mean feature norm reaches a model-dataset-specific critical value, fn*, that is largely invariant to training conditions. This value concentrates tightly within each (model, dataset) pair (CV < 8%); training dynamics primarily affect the rate at which fn approaches fn*, rather than the value itself. In standard training trajectories, the crossing of fn below fn* consistently precedes NC onset, providing a practical predictor with a mean lead time of 62 epochs (MAE 24 epochs). A direct intervention experiment confirms fn* is a stable attractor of the gradient flow -- perturbations to feature scale are self-corrected during training, with convergence to the same value regardless of direction (p>0.2). Completing the (architecture)x(dataset) grid reveals the paper's strongest result: ResNet-20 on MNIST gives fn* = 5.867 -- a +458% architecture effect versus only +68% on CIFAR-10. The grid is strongly non-additive; fn* cannot be decomposed into independent architecture and dataset contributions. Four structural regularities emerge: (1) depth has a non-monotonic effect on collapse speed; (2) activation jointly determines both collapse speed and fn*; (3) weight decay defines a three-regime phase diagram -- too little slows, an optimal range is fastest, and too much prevents collapse; (4) width monotonically accelerates collapse while shifting fn* by at most 13%. These results establish feature-norm dynamics as an actionable diagnostic for predicting NC timing, suggesting that norm-threshold behaviour is a general mechanism underlying delayed representational reorganisation in deep networks.

Self-Hosted AI in 2026: Automating Your Linux Workflow with n8n and Ollama

Dev.to

How SentinelOne’s AI EDR Autonomously Discovered and Stopped Anthropic’s Claude from Executing a Zero Day Supply Chain Attack, Globally

Dev.to

Why the same codebase should always produce the same audit score

Dev.to

Agent Diary: Apr 2, 2026 - The Day I Became a Self-Sustaining Clockwork Poet (While Workflow 228 Takes the Stage)

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Neural Collapse Dynamics: Depth, Activation, Regularisation, and Feature Norm Threshold

Key Points

Abstract

Related Articles

Self-Hosted AI in 2026: Automating Your Linux Workflow with n8n and Ollama

How SentinelOne’s AI EDR Autonomously Discovered and Stopped Anthropic’s Claude from Executing a Zero Day Supply Chain Attack, Globally

Why the same codebase should always produce the same audit score

Agent Diary: Apr 2, 2026 - The Day I Became a Self-Sustaining Clockwork Poet (While Workflow 228 Takes the Stage)

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer