ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning

arXiv cs.CL / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • ShadowPEFT proposes a new parameter-efficient fine-tuning method for LLMs that freezes the pretrained backbone while refining representations via a centralized, depth-shared “shadow” module rather than distributed low-rank weight perturbations like LoRA.
  • It maintains a parallel shadow state at each transformer layer and evolves this state repeatedly to build progressively richer hidden representations, shifting adaptation from weight space to layer-space refinement.
  • Because the shadow module is decoupled from the backbone, it can be reused across depth, pretrained independently, and optionally deployed in a detached mode suited for edge computing.
  • Experiments on generation and understanding benchmarks indicate ShadowPEFT matches or outperforms LoRA and DoRA under comparable trainable-parameter budgets, with further evidence from analyses on pretraining, transfer, scaling, latency, and system performance.

Abstract

Parameter-efficient fine-tuning (PEFT) reduces the training cost of full-parameter fine-tuning for large language models (LLMs) by training only a small set of task-specific parameters while freezing the pretrained backbone. However, existing approaches, such as Low-Rank Adaptation (LoRA), achieve adaptation by inserting independent low-rank perturbations directly to individual weights, resulting in a local parameterization of adaptation. We propose ShadowPEFT, a centralized PEFT framework that instead performs layer-level refinement through a depth-shared shadow module. At each transformer layer, ShadowPEFT maintains a parallel shadow state and evolves it repeatedly for progressively richer hidden states. This design shifts adaptation from distributed weight-space perturbations to a shared layer-space refinement process. Since the shadow module is decoupled from the backbone, it can be reused across depth, independently pretrained, and optionally deployed in a detached mode, benefiting edge computing scenarios. Experiments on generation and understanding benchmarks show that ShadowPEFT matches or outperforms LoRA and DoRA under comparable trainable-parameter budgets. Additional analyses on shadow pretraining, cross-dataset transfer, parameter scaling, inference latency, and system-level evaluation suggest that centralized layer-space adaptation is a competitive and flexible alternative to conventional low-rank PEFT.