ActTail: Global Activation Sparsity in Large Language Models
arXiv cs.LG / 3/16/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- ActTail introduces a TopK magnitude-based activation sparsity method with global allocation for large language models, aiming to reduce compute and memory movement during inference.
- It explicitly accounts for heterogeneity in transformer weights by computing a heavy-tail exponent from each projection's empirical spectral density to allocate projection-specific sparsity budgets.
- The paper provides a theoretical relationship between activation sparsity ratio and the heavy-tail exponent under the HT-SR regime to guide sparsity decisions beyond heuristic rules.
- Experimental results on LLaMA and Mistral show improved perplexity and downstream task performance at high sparsity, with 80% sparsity achieving substantial reductions (e.g., 21.8% on LLaMA-2-7B, 40.1% on LLaMA-2-13B, 9.4% on Mistral-7B).




