ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity

arXiv cs.LG / 5/6/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces ELAS, a framework for efficiently pre-training low-rank LLMs by applying 2:4 structured sparsity to activations rather than leaving activation matrices full-rank.
ELAS modifies low-rank feed-forward networks using squared ReLU, then applies NVIDIA-friendly 2:4 structured sparse formatting to activations after that operation.
Experiments on LLaMA models (60M to 1B parameters) show ELAS preserves model performance with minimal degradation compared with non-sparse baselines.
The method provides training and inference speedups and reduces activation memory overhead, with the benefits becoming especially pronounced for large batch sizes.
The authors state the code is publicly available via the ELAS Repo, enabling replication and further experimentation.

Abstract

Large Language Models (LLMs) have achieved remarkable capabilities, but their immense computational demands during training remain a critical bottleneck for widespread adoption. Low-rank training has received attention in recent years due to its ability to significantly reduce training memory usage. Meanwhile, applying 2:4 structured sparsity to weights and activations to leverage NVIDIA GPU support for 2:4 structured sparse format has become a promising direction. However, existing low-rank methods often leave activation matrices in full-rank, which dominates memory consumption and limits throughput during large-batch training. Furthermore, directly applying sparsity to weights often leads to non-negligible performance degradation. To achieve efficient pre-training of LLMs, this paper proposes ELAS: Efficient pre-training of Low-rank LLMs via 2:4 Activation Sparsity, a novel framework for low-rank models via 2:4 activation sparsity. ELAS applies squared ReLU activation functions to the feed-forward networks in low-rank models and implements 2:4 structured sparsity on the activations after the squared ReLU operation. We evaluated ELAS through pre-training experiments on LLaMA models ranging from 60M to 1B parameters. The results demonstrate that ELAS maintains performance with minimal degradation after applying 2:4 activation sparsity, while achieving training and inference acceleration. Moreover, ELAS reduces activation memory overhead, particularly with large batch sizes. Code is available at ELAS Repo.

Helping ChatGPT better recognize context in sensitive conversations

Dev.to

The Code You Shipped Yesterday Won't Scale Tomorrow, Here's Why

Dev.to

📈 I just launched NeuroArchAI Platform – and it's completely FREE on GitHub right now.

Dev.to

From One-on-One to One-to-Many: Scaling Your Coaching with AI

Dev.to

Run Gemma 4 on Your Laptop — A Hands-On Guide to Google's Latest Open Multimodal LLM

Dev.to

ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity

Key Points

Abstract

Related Articles

Helping ChatGPT better recognize context in sensitive conversations

The Code You Shipped Yesterday Won't Scale Tomorrow, Here's Why

📈 I just launched NeuroArchAI Platform – and it's completely FREE on GitHub right now.

From One-on-One to One-to-Many: Scaling Your Coaching with AI

Run Gemma 4 on Your Laptop — A Hands-On Guide to Google's Latest Open Multimodal LLM

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

Helping ChatGPT better recognize context in sensitive conversations

The Code You Shipped Yesterday Won't Scale Tomorrow, Here's Why

**📈 I just launched NeuroArchAI Platform – and it's completely FREE on GitHub right now.**

From One-on-One to One-to-Many: Scaling Your Coaching with AI

Run Gemma 4 on Your Laptop — A Hands-On Guide to Google's Latest Open Multimodal LLM

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

📈 I just launched NeuroArchAI Platform – and it's completely FREE on GitHub right now.