Selective Forgetting for Large Reasoning Models
arXiv cs.AI / 4/7/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- Large Reasoning Models that produce chain-of-thought (CoT) traces can leak sensitive information, motivating selective forgetting (machine unlearning) to mitigate ethical and legal risks.
- The paper argues that prior unlearning methods often target only final answers and can degrade the model’s overall reasoning, and that naively unlearning entire CoTs may harm general reasoning ability.
- It proposes a new LRM unlearning framework that selectively removes forget-relevant reasoning components by using retrieval-augmented generation (RAG) plus multiple LLMs to locate targeted CoT segments.
- Instead of deleting structure, the method replaces targeted CoT parts with benign placeholders to preserve logical flow while suppressing the likelihood of generating the forgotten content.
- Experiments on synthetic and medical datasets suggest the approach both suppresses forgotten information and maintains structurally valid reasoning behavior, supported by a dedicated feature replacement unlearning loss.
Related Articles
[R] The ECIH: Model Modeling Agentic Identity as an Emergent Relational State [R]
Reddit r/MachineLearning
Google DeepMind Unveils Project Genie: The Dawn of Infinite AI-Generated Game Worlds
Dev.to
Artificial Intelligence and Life in 2030: The One Hundred Year Study onArtificial Intelligence
Dev.to
From Booth Chaos to Scalable Conversations: AI for Hyper-Personalized Follow-Up
Dev.to
AI in 2030: 20 Powerful Trends That Will Shape the Future
Dev.to