GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection

arXiv cs.LG / 3/11/2026

Models & Research

Read original →

共有:

Key Points

Parameter-Efficient Fine-Tuning (PEFT) is essential for adapting large language models efficiently, with recent sparse tuning methods reducing computational overhead by selective updates.
Existing methods focus on either layer-selective or data-selective tuning but typically ignore the varying contributions of different data points to individual model layers.
The proposed Gradient-aligned Sparse Tuning (GAST) method jointly optimizes data and layer selection to reduce redundancy by adaptively selecting impactful data points for each layer.
GAST integrates layer- and data-sparse strategies into a unified framework, outperforming baseline approaches and offering a more nuanced and effective PEFT solution.
Experimental results validate GAST's superior performance, highlighting its promise for advancing future parameter-efficient tuning research in large language models.

Computer Science > Machine Learning

arXiv:2603.09865 (cs)

[Submitted on 10 Mar 2026]

Title:GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection

Authors:Kai Yao, Zhenghan Song, Kaixin Wu, Mingjie Zhong, Danzhao Cheng, Zhaorui Tan, Yixin Ji, Penglei Gao

View a PDF of the paper titled GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection, by Kai Yao and 7 other authors

View PDF HTML (experimental)

Abstract:Parameter-Efficient Fine-Tuning (PEFT) has become a key strategy for adapting large language models, with recent advances in sparse tuning reducing overhead by selectively updating key parameters or subsets of data. Existing approaches generally focus on two distinct paradigms: layer-selective methods aiming to fine-tune critical layers to minimize computational load, and data-selective methods aiming to select effective training subsets to boost training. However, current methods typically overlook the fact that different data points contribute varying degrees to distinct model layers, and they often discard potentially valuable information from data perceived as of low quality. To address these limitations, we propose Gradient-aligned Sparse Tuning (GAST), an innovative method that simultaneously performs selective fine-tuning at both data and layer dimensions as integral components of a unified optimization strategy. GAST specifically targets redundancy in information by employing a layer-sparse strategy that adaptively selects the most impactful data points for each layer, providing a more comprehensive and sophisticated solution than approaches restricted to a single dimension. Experiments demonstrate that GAST consistently outperforms baseline methods, establishing a promising direction for future research in PEFT strategies.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2603.09865 [cs.LG]
	(or arXiv:2603.09865v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2603.09865 Focus to learn more arXiv-issued DOI via DataCite

Submission history

From: Penglei Gao [view email]
[v1] Tue, 10 Mar 2026 16:28:48 UTC (382 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled GAST: Gradient-aligned Sparse Tuning of Large Language Models with Data-layer Selection, by Kai Yao and 7 other authors

View PDF
HTML (experimental)
TeX Source

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2026-03

Change to browse by:

References & Citations

export BibTeX citation Loading...

BibTeX formatted citation

Data provided by:

Bookmark

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

Links to Code Toggle

Papers with Code (What is Papers with Code?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Link to Influence Flower

Influence Flower (What are Influence Flowers?)

Core recommender toggle

CORE Recommender (What is CORE?)

IArxiv recommender toggle

IArxiv Recommender (What is IArxiv?)

Author
Venue
Institution
Topic

About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI

TechCrunch

[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln classes in agent frameworks)

Reddit r/MachineLearning

My Experience with Qwen 3.5 35B

Reddit r/LocalLLaMA

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4

VentureBeat

Qwen 3.5 122B completely falls apart at ~ 100K context