AI Navigate

Reference-Guided Machine Unlearning

arXiv cs.LG / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Reference-Guided Unlearning (ReGUn), a framework to remove the influence of specific data from trained models while preserving overall utility.
  • It argues that existing approximate unlearning methods rely on performance-degradation signals like loss maximization or random labeling, which can be unstable and harm generalization.
  • ReGUn uses a disjoint held-out dataset as a principled, class-conditioned reference for distillation to achieve distributional indistinguishability between forget data and unseen data.
  • Across various model architectures, natural image datasets, and varying forget fractions, ReGUn consistently outperforms standard baselines, achieving a better forgetting-utility trade-off.

Abstract

Machine unlearning aims to remove the influence of specific data from trained models while preserving general utility. Existing approximate unlearning methods often rely on performance-degradation heuristics, such as loss maximization or random labeling. However, these signals can be poorly conditioned, leading to unstable optimization and harming the model's generalization. We argue that unlearning should instead prioritize distributional indistinguishability, aligning the model's behavior on forget data with its behavior on truly unseen data. Motivated by this, we propose Reference-Guided Unlearning (ReGUn), a framework that leverages a disjoint held-out dataset to provide a principled, class-conditioned reference for distillation. We demonstrate across various model architectures, natural image datasets, and varying forget fractions that ReGUn consistently outperforms standard approximate baselines, achieving a superior forgetting-utility trade-off.