VLA-Forget: Vision-Language-Action Unlearning for Embodied Foundation Models
arXiv cs.CV / 4/7/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies a specific “unlearning” problem for vision-language-action (VLA) embodied foundation models: removing unsafe, spurious, or privacy-sensitive behaviors can inadvertently degrade perception, language grounding, or action control.
- It argues that undesirable behavior knowledge is often distributed across the vision encoder, cross-modal projector, language backbone/reasoning layers, and action-generating blocks, making single-module or conventional (standalone vision/language) unlearning approaches insufficient.
- The proposed method, VLA-Forget, uses a hybrid strategy combining ratio-aware selective editing in the perception components with layer-selective reasoning/action unlearning in the upper transformer blocks.
- VLA-Forget jointly optimizes targeted forgetting, perceptual preservation, and reasoning retention via staged updates across the visual encoder, projector, and action-generating layers.
- Reported experiments show improved forgetting efficacy (+10%), better perceptual specificity preservation (+22%), higher retained reasoning/task success (+9%), and reduced need for post-quantization recovery (−55%) versus strong unlearning baselines.
Related Articles

Black Hat Asia
AI Business

OpenAI's pricing is about to change — here's why local AI matters more than ever
Dev.to

Google AI Tells Users to Put Glue on Their Pizza!
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Could it be that this take is not too far fetched?
Reddit r/LocalLLaMA