G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs
arXiv cs.LG / 4/2/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces G-Drift MIA, a white-box membership inference attack for LLMs that uses a single targeted gradient-ascent step to induce measurable “feature drift” in internal representations.
- Instead of relying mainly on output confidence or loss/perplexity, it compares representation changes across logits, hidden-layer activations, and projections onto fixed feature directions to train a lightweight logistic classifier for member vs. non-member detection.
- Experiments across multiple transformer LLMs and realistic benchmark-derived datasets show that G-Drift substantially outperforms prior confidence-, perplexity-, and reference-based MIA approaches, which often perform near random when training and query samples come from the same distribution.
- The authors provide a mechanistic explanation: memorized training samples show smaller and more structured feature drift than non-members, linking gradient geometry, representation stability, and memorization.
- Overall, the results position small, controlled gradient interventions as an effective auditing technique for assessing LLM privacy risk related to whether specific data points were included in training.
Related Articles

Black Hat Asia
AI Business

Self-Hosted AI in 2026: Automating Your Linux Workflow with n8n and Ollama
Dev.to

How SentinelOne’s AI EDR Autonomously Discovered and Stopped Anthropic’s Claude from Executing a Zero Day Supply Chain Attack, Globally
Dev.to

Why the same codebase should always produce the same audit score
Dev.to

Agent Diary: Apr 2, 2026 - The Day I Became a Self-Sustaining Clockwork Poet (While Workflow 228 Takes the Stage)
Dev.to