I Found a Hidden Ratio in Transformers That Predicts Geometric Stability [R]

Reddit r/MachineLearning / 5/12/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The article reports a Lyapunov spectral analysis finding that the ratio between MLP and attention spectral norms in decoder transformer models can predict whether the model collapses to a rank-1 state by the final layers.
  • It claims geometric stability is best maintained when this spectral ratio is kept in the range of roughly 0.5 to 2.
  • The work is presented as a rule-of-thumb for assessing/steering training dynamics toward stable representations in transformer decoders.
  • A GitHub repository is provided for readers to examine or reproduce the analysis (https://github.com/yousef-rafat/the-1-1-rule).

I have analyzed some decoder transformer models using Lyapunov spectral analysis and found that the ratio of the MLP and attention spectral norms strongly indicates whether a model will eventually collapse to rank-1 or not by the final layers.

I found that the spectral ratio is best kept around 0.5–2 for keeping the model stable till the final layers.

Github repo: https://github.com/yousef-rafat/the-1-1-rule

submitted by /u/Otaku_7nfy
[link] [comments]