GAIN: Multiplicative Modulation for Domain Adaptation

arXiv cs.LG / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

LLM domain adaptation can cause catastrophic forgetting because common adaptation methods like full fine-tuning or LoRA introduce new directions in the model’s weight space.
The paper proposes GAIN (Multiplicative Modulation), which re-emphasizes existing features via multiplicative scaling W_new = S * W using a learned diagonal matrix S applied to the attention output projection and optionally the FFN.
Experiments across five model families (774M–70B) and eight sequential domain adaptations show GAIN-FFN matches LoRA on in-domain validation PPL.
Critically, GAIN-FFN reduces forgetting: previously trained domains improve by 7–13% in validation PPL, while LoRA degrades them by 18–36%, with examples like BoolQ degrading far less under GAIN-FFN than LoRA after multiple adaptations.
GAIN introduces a modest parameter overhead (46K–230K per model) and can be absorbed into pretrained weights, yielding zero additional inference cost.

Abstract

Adapting LLMs to new domains causes forgetting because standard methods (full fine-tuning, LoRA) inject new directions into the weight space. We propose GAIN, which re-emphasizes existing features through multiplicative modulation W_new = S * W. The learned diagonal matrix S is applied to the attention output projection and optionally the FFN. The principle mirrors gain modulation in neuroscience, where neurons adapt to context by scaling response strength while preserving selectivity. We evaluate GAIN on five models from four families (774M to 70B), adapting sequentially across eight domains. GAIN-FFN matches LoRA's in-domain adaptation, but their effects on previously trained domains are opposite: GAIN-FFN improves them by 7-13% (validation PPL), while LoRA degrades them by 18-36%. Downstream accuracy confirms the pattern: for example, after seven sequential adaptations on Qwen2.5, GAIN-FFN degrades BoolQ by only 0.8% while LoRA damages it by 14.9%. GAIN adds 46K-230K parameters per model and can be absorbed into the pretrained weights for zero inference cost.

Black Hat Asia

AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)

Dev.to

GAIN: Multiplicative Modulation for Domain Adaptation

Key Points

Abstract

Related Articles

Black Hat Asia

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer