Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate

arXiv cs.RO / 3/30/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper introduces “Realtime-VLA V2,” focusing on practical end-to-end deployment techniques that let VLA-driven robots run quickly while maintaining accuracy and dexterity in real-world tasks.
  • It builds on prior work that targeted fast GPU inference, but now addresses the full robotics deployment pipeline, covering calibration, planning & control, and learning-based methods for selecting optimal execution speed.
  • Experiments show the robot can achieve performance comparable to casual human operation and operate near the hardware limits of a lightweight robotic arm.
  • The authors provide unaccelerated videos and inference traces to support evaluation and replication of the reported real-time behavior.

Abstract

In deployment of the VLA models to real-world robotic tasks, execution speed matters. In previous work arXiv:2510.26742 we analyze how to make neural computation of VLAs on GPU fast. However, we leave the question of how to actually deploy the VLA system on the real robots open. In this report we describe a set of practical techniques to achieve the end-to-end result of running a VLA-driven robot at an impressive speed in real world tasks that require both accuracy and dexterity. The stack of technology ranges across calibration, planning & control, and learning based method to identify optimal execution speed. In the tasks we show, the robot even executes in a speed on par with casual human operation and approaching the hardware limit of our lightweight arm. The unaccelerated videos and inference traces are provided in https://dexmal.github.io/realtime-vla-v2/.