Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

arXiv cs.LG / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper presents a systematic study of continual reinforcement learning for large pretrained Vision-Language-Action models across three models and five lifelong RL benchmarks, challenging conventional beliefs.
It finds that simple sequential fine-tuning with Low-Rank Adaptation (LoRA) achieves high plasticity, minimal forgetting, and strong zero-shot generalization, often outperforming more complex CRL methods.
The robustness is attributed to a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL, reshaping the stability-plasticity trade-off for continual adaptation.
Code for the project is released at github.com/UT-Austin-RobIn/continual-vla-rl to support reproducibility and practical experimentation.

Abstract

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across three models and five challenging lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.

The massive shift toward edge computing and local processing

Dev.to

Self-Refining Agents in Spec-Driven Development

Dev.to

Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs

Dev.to

The Three-Agent Protocol Is Transferable. The Discipline Isn't.

Dev.to

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop

Reddit r/LocalLLaMA

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

Key Points

Abstract

Related Articles

The massive shift toward edge computing and local processing

Self-Refining Agents in Spec-Driven Development

Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs

The Three-Agent Protocol Is Transferable. The Discipline Isn't.

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer