OASIS: Online Activation Subspace Learning for Memory-Efficient Training

arXiv cs.LG / 4/13/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces OASIS, an online activation subspace learning method that maintains a continuously updated low-dimensional representation of activations during LLM training.
By projecting intermediate activations into this evolving subspace, OASIS reduces activation memory without changing forward-pass computation, keeping model behavior intact.
The learned activation subspace also yields low-rank gradient representations, allowing gradients and optimizer states to be stored/maintained in the reduced space.
A projection-aware optimizer is proposed to transport optimizer states across subspace updates, aiming for training stability while the subspace evolves.
Experiments across pretraining and finetuning tasks report up to 2× lower peak memory than full fine-tuning while matching performance and outperforming prior low-rank approaches.

Abstract

Training large language models (LLMs) is constrained by memory requirements, with activations accounting for a substantial fraction of the total footprint. Existing approaches reduce memory using low-rank weight parameterizations or low-rank gradient subspaces for optimizer states, while activation memory is addressed through architectural modifications or compression schemes based on periodically updated projections. We propose OASIS, an online activation subspace learning algorithm for memory-efficient training that tracks and continuously updates a low-dimensional activation subspace during training. Intermediate activations are projected onto this evolving subspace, reducing memory without modifying forward-pass computations. The evolving activation subspace induces low-rank gradient representations, enabling both gradients and optimizer states to be maintained directly in this subspace, while a projection-aware optimizer consistently transports optimizer states across subspace updates for stable training. Across various finetuning and pretraining tasks, OASIS achieves up to