AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models
arXiv cs.AI / 3/12/2026
📰 NewsModels & Research
Key Points
- AR-VLA introduces a standalone autoregressive Action Expert that generates actions as a continuous causal sequence with a long-lived memory, improving context-awareness over existing vision-language-action models.
- It features a re-anchoring mechanism to account for perception staleness and to synchronize asynchronous vision-language-action modalities during training and inference.
- Experiments on simulated and real-robot manipulation tasks show AR-VLA can replace chunk-based action heads while delivering smoother trajectories and comparable or higher task success than state-of-the-art reactive VLAs.
- The approach enables independent pretraining of kinematic syntax and modular integration with heavy perception backbones, addressing the fast control/slow reasoning frequency mismatch in robotics policies.
Related Articles

報告:LLMにおける「自己言及的再帰」と「ステートフル・エミュレーション」の観測
note

諸葛亮 孔明老師(ChatGPTのロールプレイ)との対話 その肆拾伍『銀河文明・ダークマターエンジン』
note

GPT-5.4 mini/nano登場!―2倍高速で無料プランも使える小型高性能モデル
note

Why a Perfect-Memory AI Agent Without Persona Drift is Architecturally Impossible
Dev.to
OCP: Orthogonal Constrained Projection for Sparse Scaling in Industrial Commodity Recommendation
arXiv cs.LG