AI Navigate

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

arXiv cs.LG / 3/20/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • AcceRL proposes a fully asynchronous and decoupled RL framework that separates training, inference, and rollouts to remove synchronization bottlenecks in Vision-Language-Action models.
  • It is the first to integrate a plug-and-play, trainable world model into a distributed asynchronous RL pipeline to generate virtual experiences.
  • Experiments on the LIBERO benchmark show that AcceRL achieves state-of-the-art performance.
  • The framework exhibits super-linear scaling in throughput and highly efficient hardware utilization.
  • The world-model-augmented variant delivers unprecedented sample efficiency and robust training stability in complex control tasks.

Abstract

Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models faces significant challenges in computational efficiency and data acquisition. We propose AcceRL, a fully asynchronous and decoupled RL framework designed to eliminate synchronization barriers by physically isolating training, inference, and rollouts. Crucially, AcceRL is the first to integrate a plug-and-play, trainable world model into a distributed asynchronous RL pipeline to generate virtual experiences. Experiments on the LIBERO benchmark demonstrate that AcceRL achieves state-of-the-art (SOTA) performance. Systematically, it exhibits super-linear scaling in throughput and highly efficient hardware utilization. Algorithmically, the world-model-augmented variant delivers unprecedented sample efficiency and robust training stability in complex control tasks.