VLA-Forget: Vision-Language-Action Unlearning for Embodied Foundation Models

arXiv cs.CV / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies a specific “unlearning” problem for vision-language-action (VLA) embodied foundation models: removing unsafe, spurious, or privacy-sensitive behaviors can inadvertently degrade perception, language grounding, or action control.
  • It argues that undesirable behavior knowledge is often distributed across the vision encoder, cross-modal projector, language backbone/reasoning layers, and action-generating blocks, making single-module or conventional (standalone vision/language) unlearning approaches insufficient.
  • The proposed method, VLA-Forget, uses a hybrid strategy combining ratio-aware selective editing in the perception components with layer-selective reasoning/action unlearning in the upper transformer blocks.
  • VLA-Forget jointly optimizes targeted forgetting, perceptual preservation, and reasoning retention via staged updates across the visual encoder, projector, and action-generating layers.
  • Reported experiments show improved forgetting efficacy (+10%), better perceptual specificity preservation (+22%), higher retained reasoning/task success (+9%), and reduced need for post-quantization recovery (−55%) versus strong unlearning baselines.

Abstract

Vision-language-action (VLA) models are emerging as embodied foundation models for robotic manipulation, but their deployment introduces a new unlearning challenge: removing unsafe, spurious, or privacy-sensitive behaviors without degrading perception, language grounding, and action control. In OpenVLA-style policies, behavior is produced through a fused visual encoder, a cross-modal projector, and a language backbone that predicts tokenized robot actions, so undesirable knowledge can be distributed across perception, alignment, and reasoning/action layers rather than confined to a single module. Consequently, partial unlearning applied only to the vision stack or only to the language backbone is often insufficient, while conventional unlearning baselines designed for standalone vision or language models may leave residual forgetting or incur unnecessary utility loss in embodied settings. We propose VLA-Forget, a hybrid unlearning framework that combines ratio-aware selective editing for perception and cross-modal specificity with layer-selective reasoning/action unlearning for utility-preserving forgetting. VLA-Forget jointly optimizes three objectives: targeted forgetting, perceptual preservation, and reasoning retention, through staged updates over the visual encoder, projector, and upper action-generating transformer blocks. Across forget-set behavior probes and retain-task evaluations, VLA-Forget improves forgetting efficacy by 10%, preserves perceptual specificity by 22%, retains reasoning and task success by 9%, and reduces post-quantization recovery by 55% relative to strong unlearning baselines.