Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning
arXiv cs.CV / 4/6/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that existing training-based “visual concept unlearning” can confound evaluation because fine-tuning on a small forget set already harms general capability before unlearning is measured.
- It introduces VLM-UnBench, a new benchmark for training-free visual concept unlearning, spanning multiple forgetting levels, datasets, and concept axes with probe and evaluation conditions designed to distinguish true forgetting from mere instruction-following.
- Across many VLM configurations and evaluation setups, realistic unlearning prompts achieve forget accuracy close to the no-instruction baseline, while meaningful improvements only appear under special “oracle” conditions that effectively reveal the target concept.
- Object and scene concepts are found to be especially resistant to suppression, and instruction-tuned models can still retain relevant visual knowledge even when explicitly instructed to forget.
- Overall, the results highlight a gap between prompt-level suppression (instruction compliance) and true visual concept erasure (removal of underlying representations).
Related Articles

Black Hat Asia
AI Business

Оказывается, эта нейросеть рисует бесплатно. Я узнал случайно.
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Three-Layer Memory Governance: Core, Provisional, Private
Dev.to

I Researched AI Prompting So You Don’t Have To
Dev.to