IMPACT: Importance-Aware Activation Space Reconstruction

arXiv stat.ML / 4/22/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that weight low-rank compression often fails for LLMs because the low-rank assumption for weights may not hold.
  • It proposes IMPACT, which instead reconstructs and compresses using activation low-rank structure to better match how LLMs behave in practice.
  • IMPACT introduces an importance-aware optimization that weights activation reconstruction by gradient-based importance, producing a closed-form solution based on an importance-weighted activation covariance matrix.
  • Experiments across multiple models and tasks show IMPACT can reduce model size significantly (up to 55.4%) while keeping accuracy comparable to or better than existing compression baselines.
  • Overall, the method directly connects compression choices to expected performance impact, aiming to improve deployability in resource-constrained environments.

Abstract

Large language models (LLMs) achieve strong performance across diverse domains but remain difficult to deploy in resource-constrained environments due to their size. Low-rank compression is a common remedy, typically minimizing weight reconstruction error under the assumption that weights are low-rank. However, this assumption often does not hold in LLMs. In contrast, LLM activations exhibit a more pronounced low-rank structure, motivating approaches that minimize activation reconstruction error. This shift alone, however, is not sufficient: different activation dimensions contribute unequally to model performance, and treating them uniformly can lead to accuracy loss. We introduce IMPACT, an importance-aware activation reconstruction framework that links compression to its effect on model performance. IMPACT formulates compression as an optimization problem that integrates activation structure with gradient-based importance, deriving a closed-form solution where reconstruction bases arise from an importance-weighted activation covariance matrix. This yields low-rank compression explicitly optimized for accuracy preservation. Experiments across multiple models and tasks demonstrate that IMPACT achieves up to 55.4% greater model size reduction while maintaining accuracy comparable to or better than state-of-the-art baselines.