Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures

arXiv cs.LG / 4/20/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • The paper introduces Aletheia, a gradient-guided method that selects the most task-relevant transformer layers for LoRA rather than applying adapters uniformly across all layers.
  • Aletheia uses a lightweight gradient probe to identify relevant layers and performs LoRA with asymmetric rank allocation only on those selected layers.
  • Across 81 experiment rows spanning 14 successful model variants from 8 architecture families (0.5B–72B parameters, including dense and Mixture-of-Experts), Aletheia delivers a mean 23.1% training speedup with 15–28% gains.
  • The approach shows bounded extra forgetting and broadly matched downstream results on MMLU, GSM8K, and HumanEval, with reported preservation of behavior in a second campaign that included one failed attempt (Pythia/GPT-NeoX).
  • Overall, the results support a practical “model economics” claim that intelligent layer selection can make LoRA fine-tuning significantly more efficient while causing limited degradation on the evaluated benchmarks.

Abstract

Low-Rank Adaptation (LoRA) has become the dominant parameter-efficient fine-tuning method for large language models, yet standard practice applies LoRA adapters uniformly to all transformer layers regardless of their relevance to the downstream task. We introduce Aletheia, a gradient-guided layer selection method that identifies the most task-relevant layers via a lightweight gradient probe and applies LoRA adapters only to those layers with asymmetric rank allocation. Across 81 experiment rows covering 14 successful models from 8 architecture families (0.5B-72B parameters, including dense and Mixture-of-Experts architectures), with one additional documented failed Pythia/GPT-NeoX attempt in Campaign 2, Aletheia achieves a 15-28% training speedup (mean 23.1%, p < 0.001) with bounded extra forgetting and broadly matched downstream behavior on the evaluated MMLU, GSM8K, and HumanEval benchmark pack. Across the tested families and scales, Campaign 1 shows a 100% per-model speed win rate and Campaign 2 shows broadly preserved downstream behavior within a bounded-degradation framing. Together these results support a practical model-economics claim: intelligent layer selection can make LoRA fine-tuning materially more efficient without introducing major downstream damage on the evaluated set.