Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning
arXiv cs.CL / 5/4/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces TokenUnlearn, a token-level framework for machine unlearning in large language models that avoids the limitations of existing sequence-level methods.
- It uses knowledge-aware masking and entropy-aware signals to compute token importance scores, enabling more precise targeting of the subset of tokens that actually encode the knowledge to be removed.
- Two strategies are proposed: hard selection (apply unlearning only to high-importance tokens) and soft weighting (scale gradient contributions by importance).
- The authors provide theoretical evidence that token-level selection improves the gradient signal-to-noise ratio and reduces suboptimal forgetting.
- Experiments on TOFU and WMDP across three model architectures show consistent gains in forgetting effectiveness while preserving overall utility compared with sequence-level baselines.
Related Articles
AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs
Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI
The Verge

CLMA Frame Test
Dev.to

You Are Right — You Don't Need CLAUDE.md
Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions
Dev.to