Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures

arXiv cs.LG / 4/20/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces Aletheia, a gradient-guided method that selects the most task-relevant transformer layers for LoRA rather than applying adapters uniformly across all layers.
Aletheia uses a lightweight gradient probe to identify relevant layers and performs LoRA with asymmetric rank allocation only on those selected layers.
Across 81 experiment rows spanning 14 successful model variants from 8 architecture families (0.5B–72B parameters, including dense and Mixture-of-Experts), Aletheia delivers a mean 23.1% training speedup with 15–28% gains.
The approach shows bounded extra forgetting and broadly matched downstream results on MMLU, GSM8K, and HumanEval, with reported preservation of behavior in a second campaign that included one failed attempt (Pythia/GPT-NeoX).
Overall, the results support a practical “model economics” claim that intelligent layer selection can make LoRA fine-tuning significantly more efficient while causing limited degradation on the evaluated benchmarks.

Abstract

Low-Rank Adaptation (LoRA) has become the dominant parameter-efficient fine-tuning method for large language models, yet standard practice applies LoRA adapters uniformly to all transformer layers regardless of their relevance to the downstream task. We introduce Aletheia, a gradient-guided layer selection method that identifies the most task-relevant layers via a lightweight gradient probe and applies LoRA adapters only to those layers with asymmetric rank allocation. Across 81 experiment rows covering 14 successful models from 8 architecture families (0.5B-72B parameters, including dense and Mixture-of-Experts architectures), with one additional documented failed Pythia/GPT-NeoX attempt in Campaign 2, Aletheia achieves a 15-28% training speedup (mean 23.1%, p < 0.001) with bounded extra forgetting and broadly matched downstream behavior on the evaluated MMLU, GSM8K, and HumanEval benchmark pack. Across the tested families and scales, Campaign 1 shows a 100% per-model speed win rate and Campaign 2 shows broadly preserved downstream behavior within a bounded-degradation framing. Together these results support a practical model-economics claim: intelligent layer selection can make LoRA fine-tuning materially more efficient without introducing major downstream damage on the evaluated set.

Black Hat USA

AI Business

Black Hat Asia

AI Business

Which Version of Qwen 3.6 for M5 Pro 24g

Reddit r/LocalLLaMA

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

Which Version of Qwen 3.6 for M5 Pro 24g

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer