FASTER: Rethinking Real-Time Flow VLAs

arXiv cs.RO / 4/30/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that for real-time Vision-Language-Action (VLA) deployment, existing asynchronous inference methods often focus on trajectory smoothness while overlooking the latency needed to react to environmental changes.
  • It shows that reaction time can be modeled as following a uniform distribution shaped by the Time to First Action (TTFA) and the execution horizon, offering a framework for analyzing action reaction timing.
  • The authors identify a key bottleneck in flow-based VLA systems caused by using a constant schedule, which can force all sampling steps to finish before movement starts.
  • They propose FASTER (Fast Action Sampling for ImmediaTE Reaction), which uses a Horizon-Aware Schedule to prioritize near-term actions during flow sampling and compress immediate reaction denoising by about 10× in certain settings while keeping long-horizon trajectory quality.
  • Experiments on real robots, including a highly dynamic table tennis task, demonstrate substantially lower effective reaction latency—especially on consumer-grade GPUs—enabling more responsive real-world behavior for generalist policies.

Abstract

Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in \pi_{0.5} and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.