Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning

arXiv cs.LG / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses LLM unlearning after deployment, focusing on the practical problem that the “forget” and “retain” data sets are often not readily available.
It proposes a framework called data Pareto improvement that formalizes how better retrieval can expand the achievable trade-off between forgetting and retention.
The authors introduce RASLIK (Randomized Antipodal Search on Linearized Influence Kernel), a retrieval algorithm combining permutation-projection hashing with randomized antipodal search.
RASLIK is reported to reduce selection variance, achieve sublinear computational complexity, and deliver improvements in both unlearning quality and efficiency.
Across multiple models, datasets, and unlearning methods, RASLIK outperforms deterministic retrieval baselines and even oracle sampling, supporting randomized search as a scalable approach for data-centric unlearning.

Abstract

Large language models (LLMs) sometimes memorize undesirable knowledge, which must be removed after deployment. Prior work on machine unlearning has focused largely on optimization methods that adjust parameters to enforce forgetting while preserving retention. However, these approaches assume that the forget and retain sets are readily available, which rarely holds in practice. Unlearning is typically triggered by an undesired generation at inference time, making the retrieval of relevant data the central challenge. We introduce the notion of data Pareto improvement for LLM unlearning, which formalizes how retrieval can expand the achievable trade-off frontier between forgetting and retention. To realize this principle, we propose Randomized Antipodal Search on Linearized Influence Kernel (RASLIK), a retrieval algorithm that combines permutation-projection hashing with randomized antipodal search. RASLIK reduces selection variance, achieves sublinear complexity, and yields a double gain in both quality and efficiency. Across multiple models, datasets, and unlearning algorithms, RASLIK consistently outperforms deterministic baselines and even oracle sampling, establishing randomized search as a principled and scalable solution for data-centric unlearning.

A practical guide to getting comfortable with AI coding tools

Dev.to

Every time a new model comes out, the old one is obsolete of course

Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Dev.to

🚀 Major BrowserAct CLI Update

Dev.to

Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning

Key Points

Abstract

Related Articles

A practical guide to getting comfortable with AI coding tools

Every time a new model comes out, the old one is obsolete of course

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

🚀 Major BrowserAct CLI Update

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer