Parameter Efficiency Is Not Memory Efficiency: Rethinking Fine-Tuning for On-Device LLM Adaptation
arXiv cs.LG / 4/28/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The paper argues that parameter-efficient fine-tuning (e.g., LoRA, IA3) does not automatically translate into memory efficiency for on-device LLM adaptation.
- It shows that even with fewer trainable parameters, PEFT methods can still require intermediate tensors that grow linearly with sequence length, leading to out-of-memory issues on-device.
- The authors propose LARS (Low-memory Activation-Rank Subspace), which constrains the activation subspace during training to decouple memory usage from sequence length.
- Experiments report average memory reductions of 33.54% on GPUs and 51.95% on CPUs versus LoRA, while maintaining competitive accuracy and throughput across multiple datasets and model types.
- The framework is also demonstrated on Raspberry Pi and consumer-grade CPUs, indicating a practical route to personalized LLM adaptation on resource-limited edge hardware.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat USA
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
How I Automate My Dev Workflow with Claude Code Hooks
Dev.to

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System
Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)
Dev.to