The Measure of Deception: An Analysis of Data Forging in Machine Unlearning

arXiv stat.ML / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies “forging” in machine unlearning, where adversaries craft data whose gradients resemble a target point’s gradient to create the illusion that the model has forgotten that data.
It formalizes this by defining an “ε-forging set” of data points whose gradients approximate a target gradient within tolerance ε.
For linear regression and one-layer neural networks, the authors prove the forging set has small Lebesgue measure, scaling roughly as ε (or ε^d when ε is sufficiently small).
Under mild regularity conditions and more generally (including batch SGD and nearly everywhere smooth losses), the forging set’s measure is shown to decay like ε^((d−r)/2), implying strong limits on how feasible adversarial forging is.
The work also provides probability bounds indicating that, under non-degenerate data distributions, randomly sampling a forging point is extremely unlikely, suggesting false unlearning claims could be detectable in principle.

Abstract

Motivated by privacy regulations and the need to mitigate the effects of harmful data, machine unlearning seeks to modify trained models so that they effectively ``forget'' designated data. A key challenge in verifying unlearning is \emph{forging} -- adversarially crafting data that mimics the gradient of a target point, thereby creating the appearance of unlearning without actually removing information. To capture this phenomenon, we consider the collection of data points whose gradients approximate a target gradient within tolerance

\epsilon

-- which we call an

\epsilon

-forging set -- and develop a framework for its analysis. For linear regression and one-layer neural networks, we show that the Lebesgue measure of this set is small. It scales on the order of

\epsilon

, and when

\epsilon

is small enough,

\epsilon^d

. More generally, under mild regularity assumptions, we prove that the forging set measure decays as

\epsilon^{(d-r)/2}

, where

d

is the data dimension and

r<d

is the dimension of vector space of right singular vectors corresponding to ``small'' singular values of a variation matrix defined by the model gradients. Extensions to batch SGD and almost-everywhere smooth loss functions yield the same asymptotic scaling. In addition, we establish probability bounds showing that, under non-degenerate data distributions, the likelihood of randomly sampling a forging point is vanishingly small. These results provide evidence that adversarial forging is fundamentally limited and that false unlearning claims can, in principle, be detected.