AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models
arXiv cs.AI / 3/12/2026
📰 NewsModels & Research
Key Points
- AR-VLA introduces a standalone autoregressive Action Expert that generates actions as a continuous causal sequence with a long-lived memory, improving context-awareness over existing vision-language-action models.
- It features a re-anchoring mechanism to account for perception staleness and to synchronize asynchronous vision-language-action modalities during training and inference.
- Experiments on simulated and real-robot manipulation tasks show AR-VLA can replace chunk-based action heads while delivering smoother trajectories and comparable or higher task success than state-of-the-art reactive VLAs.
- The approach enables independent pretraining of kinematic syntax and modular integration with heavy perception backbones, addressing the fast control/slow reasoning frequency mismatch in robotics policies.
Related Articles
[D] Matryoshka Representation Learning
Reddit r/MachineLearning
Two new Qwen3.5 “Neo” fine‑tunes focused on fast, efficient reasoning
Reddit r/LocalLLaMA

HKIC, Gobi Partners and HKU team up for fund backing university research start-ups
SCMP Tech
Yann LeCun’s New LeWorldModel (LeWM) Research Targets JEPA Collapse in Pixel-Based Predictive World Modeling
MarkTechPost
Streaming experts
Simon Willison's Blog