GRASS: Gradient-based Adaptive Layer-wise Importance Sampling for Memory-efficient Large Language Model Fine-tuning
arXiv cs.CL / 2026/4/10
📰 ニュースSignals & Early TrendsIdeas & Deep AnalysisModels & Research
要点
- The paper proposes GRASS, a memory-efficient full-parameter fine-tuning framework that improves on layer-wise importance sampling by making it adaptive to both tasks and training stages.
- GRASS estimates layer importance using mean gradient norms, enabling sampling decisions that reflect how different layers matter at different points in training.
- It further adapts layer sampling probabilities during training, aiming to preserve or improve downstream performance relative to prior static layer importance approaches.
- The method includes a layer-wise optimizer state offloading technique that overlaps computation and communication to reduce GPU memory usage without significantly hurting training throughput.
- Experiments across multiple models and benchmarks show GRASS consistently outperforms existing state-of-the-art methods, with reported average accuracy gains up to 4.38 points and memory reductions up to 19.97%.
