SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention

arXiv cs.LG / 4/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

SparseBalance addresses a key problem in long-context sparse-attention training: distributed workloads become highly heterogeneous across both sequence length and sensitivity to sparsity, causing imbalance and reduced accuracy.
The proposed approach co-designs an algorithm and system by using workload-aware dynamic sparsity tuning with a bidirectional adjustment scheme to remove straggler effects while leveraging idle “bubbles” for better throughput.
SparseBalance further improves efficiency and stability via a sparsity-aware batching strategy that enables coarse-grained load balance across training steps.
Experiments on LongBench indicate up to 1.33× end-to-end speedup and an additional 0.46% improvement in long-context capability, showing both efficiency gains and accuracy benefits.

Abstract

While sparse attention mitigates the computational bottleneck of long-context LLM training, its distributed training process exhibits extreme heterogeneity in both \textit{1)} sequence length and \textit{2)} sparsity sensitivity, leading to a severe imbalance problem and sub-optimal model accuracy. Existing algorithms and training frameworks typically focus on single issue, failing to systematically co-optimize these two problems. Therefore, we propose SparseBalance, a novel algorithm-system co-design framework, which exploits the sparsity and sequence heterogeneity to optimize model accuracy and system efficiency jointly. First, we propose workload-aware dynamic sparsity tuning, which employs a bidirectional sparsity adjustment to eliminate stragglers and exploit inherent bubbles for free accuracy. Second, we propose a sparsity-aware batching strategy to achieve coarse-grained balance, which complements dynamic sparsity tuning. Experimental results demonstrate that SparseBalance achieves up to a 1.33

\times

end-to-end speedup while still improving the long-context capability by 0.46\% on the LongBench benchmark.

Introducing Claude Opus 4.7

Anthropic News

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability

Dev.to

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp

Dev.to

Config-first code generator to replace repetitive AI boilerplate — looking for feedback and collaborators

Dev.to

The US Government Fired 40% of an Agency, Then Asked AI to Do Their Jobs

Dev.to

SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention

Key Points

Abstract

Related Articles

Introducing Claude Opus 4.7

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp

Config-first code generator to replace repetitive AI boilerplate — looking for feedback and collaborators

The US Government Fired 40% of an Agency, Then Asked AI to Do Their Jobs

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer