Quantization-Robust LLM Unlearning via Low-Rank Adaptation

arXiv cs.CL / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a practical problem in LLM machine unlearning: post-training quantization (PTQ), especially aggressive low-bit (e.g., 4-bit) quantization, can hide the effects of unlearning updates so the model “reverts” toward pre-unlearning behavior.
  • It argues that full-parameter fine-tuning often produces weight changes too small to remain distinguishable after 4-bit quantization, motivating an alternative training strategy.
  • The authors propose quantization-robust unlearning using low-rank adaptation (LoRA), freezing the base LLM and applying the forgetting change primarily through trainable adapters to preserve the effective update under quantization.
  • Experiments on Llama-2-7B with the MUSE dataset show improved 4-bit utility (up to +7.93 points on BOOKS) and better 4-bit utility on NEWS compared with baselines.
  • The method also reduces privacy leakage under 4-bit PTQ while maintaining strong forgetting metrics (e.g., PrivLeak moves substantially closer to ideal 0 in reported cases).

Abstract

Large Language Model (LLM) unlearning aims to remove targeted knowledge from a trained model, but practical deployments often require post-training quantization (PTQ) for efficient inference. However, aggressive low-bit PTQ can mask unlearning updates, causing quantized models to revert to pre-unlearning behavior. We show that standard full-parameter fine-tuning often induces parameter changes that are too small to survive 4-bit quantization. We propose quantization-robust unlearning via low-rank adaptation (LoRA): we freeze the base model and concentrate unlearning into trainable adapters so that the effective update is preserved after quantization. On Llama-2-7B evaluated with MUSE dataset (BOOKS and NEWS), LoRA improves 4-bit utility by up to 7.93 points (NPO+GDR on BOOKS: 50.17 to 58.10) and yields higher 4-bit utility on NEWS for GA+GDR (40.06 to 44.82, increase of 4.76). LoRA also substantially reduces privacy leakage under 4-bit PTQ, e.g., for GA+KLR on BOOKS, PrivLeak moves from -25.68 to -5.86 (closer to ideal 0), while maintaining strong forgetting (VerMem and KnowMem near 0). Thus, using LoRA for Machine Unlearning is beneficial for scenarios where quantization is necessary for model deployment.