Machine Unlearning under Retain-Forget Entanglement

arXiv cs.LG / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies a common challenge in machine unlearning where forgetting a target subset unintentionally harms retained samples due to feature or semantic correlations with the forget set.
  • It introduces a two-phase optimization approach: an augmented Lagrangian step to raise loss on the forget set while protecting accuracy on less-related retained data.
  • A second phase uses a gradient projection step, regularized with the Wasserstein-2 distance, to reduce degradation specifically for semantically related retained samples.
  • Experiments across multiple unlearning tasks, benchmark datasets, and neural network architectures show improved tradeoffs between retention accuracy and removal fidelity versus existing baselines.

Abstract

Forgetting a subset in machine unlearning is rarely an isolated task. Often, retained samples that are closely related to the forget set can be unintentionally affected, particularly when they share correlated features from pretraining or exhibit strong semantic similarities. To address this challenge, we propose a novel two-phase optimization framework specifically designed to handle such retai-forget entanglements. In the first phase, an augmented Lagrangian method increases the loss on the forget set while preserving accuracy on less-related retained samples. The second phase applies a gradient projection step, regularized by the Wasserstein-2 distance, to mitigate performance degradation on semantically related retained samples without compromising the unlearning objective. We validate our approach through comprehensive experiments on multiple unlearning tasks, standard benchmark datasets, and diverse neural architectures, demonstrating that it achieves effective and reliable unlearning while outperforming existing baselines in both accuracy retention and removal fidelity.