Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

arXiv cs.LG / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper finds that privacy vulnerability in neural networks exists in only a very small fraction of weights, suggesting a targeted privacy-preserving approach.
It also shows that most of these privacy-critical weights heavily affect utility, indicating a trade-off between privacy and performance.
The importance of weights is argued to stem from their locations within the network rather than their raw values.
Based on these insights, the authors propose scoring critical weights and rewinding only those weights for fine-tuning rather than retraining or discarding neurons.
Experiments indicate that this weight-level rewind method offers stronger resilience against membership inference attacks while maintaining model utility across diverse settings.

Abstract

Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessary utility loss or even more serious misalignment in predictions between training data and non-training data. In this work, we observed three insights: i) privacy vulnerability exists in a very small fraction of weights; ii) however, most of those weights also critically impact utility performance; iii) the importance of weights stems from their locations rather than their values. According to these insights, to preserve privacy, we score critical weights, and instead of discarding those neurons, we rewind only the weights for fine-tuning. We show that, through extensive experiments, this mechanism exhibits outperforming resilience in most cases against Membership Inference Attacks while maintaining utility.