Spectral Signatures of Data Quality: Eigenvalue Tail Index as a Diagnostic for Label Noise in Neural Networks
arXiv cs.LG / 3/31/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The study tests whether spectral properties of neural network weight matrices can predict test accuracy and finds that the eigenvalue tail index (tail parameter α) at the bottleneck layer strongly tracks accuracy under controlled label-noise variation (leave-one-out R² = 0.984), far outperforming conventional metrics like the Frobenius norm (LOO R² = 0.149).
- This predictive relationship is reported to generalize across three architectures (MLP, CNN, ResNet-18) and two datasets (MNIST, CIFAR-10) when the dominant factor is label corruption.
- When hyperparameters are varied while data quality is held fixed, both spectral measures (including tail α) and conventional measures become weak predictors of accuracy (R² < 0.25), and simple baselines slightly outperform spectral ones.
- The authors therefore position the tail index as a diagnostic for data quality—detecting label noise and training-set degradation—rather than a universal generalization predictor.
- A calibrated detector trained on synthetic noise is reported to identify real annotation errors in CIFAR-10N (detecting 9% noise with 3% error), and the work links the effect to the information-processing bottleneck layer and BBP phase-transition concepts, while finding the eigenvalue level spacing ratio <r> to be uninformative due to Wishart universality.
Related Articles

Black Hat Asia
AI Business
[D] How does distributed proof of work computing handle the coordination needs of neural network training?
Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside
Dev.to

BYOK is not just a pricing model: why it changes AI product trust
Dev.to

AI Citation Registries and Identity Persistence Across Records
Dev.to