Quantization-Robust LLM Unlearning via Low-Rank Adaptation

arXiv cs.CL / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses a practical problem in LLM machine unlearning: post-training quantization (PTQ), especially aggressive low-bit (e.g., 4-bit) quantization, can hide the effects of unlearning updates so the model “reverts” toward pre-unlearning behavior.
It argues that full-parameter fine-tuning often produces weight changes too small to remain distinguishable after 4-bit quantization, motivating an alternative training strategy.
The authors propose quantization-robust unlearning using low-rank adaptation (LoRA), freezing the base LLM and applying the forgetting change primarily through trainable adapters to preserve the effective update under quantization.
Experiments on Llama-2-7B with the MUSE dataset show improved 4-bit utility (up to +7.93 points on BOOKS) and better 4-bit utility on NEWS compared with baselines.
The method also reduces privacy leakage under 4-bit PTQ while maintaining strong forgetting metrics (e.g., PrivLeak moves substantially closer to ideal 0 in reported cases).

Abstract

Large Language Model (LLM) unlearning aims to remove targeted knowledge from a trained model, but practical deployments often require post-training quantization (PTQ) for efficient inference. However, aggressive low-bit PTQ can mask unlearning updates, causing quantized models to revert to pre-unlearning behavior. We show that standard full-parameter fine-tuning often induces parameter changes that are too small to survive 4-bit quantization. We propose quantization-robust unlearning via low-rank adaptation (LoRA): we freeze the base model and concentrate unlearning into trainable adapters so that the effective update is preserved after quantization. On Llama-2-7B evaluated with MUSE dataset (BOOKS and NEWS), LoRA improves 4-bit utility by up to 7.93 points (NPO+GDR on BOOKS: 50.17 to 58.10) and yields higher 4-bit utility on NEWS for GA+GDR (40.06 to 44.82, increase of 4.76). LoRA also substantially reduces privacy leakage under 4-bit PTQ, e.g., for GA+KLR on BOOKS, PrivLeak moves from -25.68 to -5.86 (closer to ideal 0), while maintaining strong forgetting (VerMem and KnowMem near 0). Thus, using LoRA for Machine Unlearning is beneficial for scenarios where quantization is necessary for model deployment.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/30DailyView insight →

Freedom and Constraints of Autonomous Agents — Self-Modification, Trust Boundaries, and Emergent Gameplay

Dev.to

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Reddit r/artificial

Stop Tweaking Prompts: Build a Feedback Loop Instead

Dev.to

Privacy-Preserving Active Learning for autonomous urban air mobility routing under real-time policy constraints

Dev.to

The Prompt Tax: Why Every AI Feature Costs More Than You Think

Dev.to

Quantization-Robust LLM Unlearning via Low-Rank Adaptation

Key Points

Abstract

💡 Insights using this article

Related Articles

Freedom and Constraints of Autonomous Agents — Self-Modification, Trust Boundaries, and Emergent Gameplay

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Stop Tweaking Prompts: Build a Feedback Loop Instead

Privacy-Preserving Active Learning for autonomous urban air mobility routing under real-time policy constraints

The Prompt Tax: Why Every AI Feature Costs More Than You Think

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer