Structural Sensitivity in Compressed Transformers: Error Propagation, Lyapunov Stability, and Formally Verified Bounds
arXiv cs.LG / 2026/3/24
💬 オピニオンIdeas & Deep AnalysisModels & Research
要点
- The study finds extreme structural sensitivity in compressed transformers, where a single matrix (from GPT-2 Small) can increase perplexity by about 20,000x, indicating sensitivity spans roughly five orders of magnitude.
- Across five transformer architectures (117M to 8B parameters), the authors identify a consistent hierarchy of compression fragility: early-layer MLP up-projection matrices are catastrophically sensitive, while value projections can compress with minimal performance loss.
- Using Lyapunov stability theory, the paper argues that residual connections help contract compression-induced errors by growing the hidden state faster than the error, providing a theoretical mechanism for partial tolerance.
- The authors show error contraction alone is insufficient: architecture-specific redundancy also matters, illustrated by a hybrid model whose degradation is far smaller than expected despite higher measured error amplification.
- The work includes ten machine-checked Lean 4 theorems that formally bound per-matrix error propagation with no unproven steps, plus empirical validation via robustness indexing (Compression Fragility Index) and downstream task benchmarks.

