Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution
arXiv cs.RO / 3/26/2026
💬 OpinionSignals & Early TrendsModels & Research
Key Points
- The paper introduces Xiaomi-Robotics-0, an open-sourced vision-language-action (VLA) model designed for high-performance, real-time robot control.
- It uses a training approach that pre-trains on large-scale cross-embodiment robot trajectories and vision-language data while mitigating catastrophic forgetting to preserve visual-semantic knowledge.
- Post-training techniques target asynchronous execution to reduce inference latency during real-robot rollouts.
- The deployment strategy aligns the timesteps of consecutive predicted action chunks to produce continuous, seamless real-time behavior.
- Experiments show state-of-the-art results in simulation benchmarks and strong performance on two demanding bimanual real-robot manipulation tasks, with fast rollouts on a consumer-grade GPU; code and checkpoints are open-sourced via the project site.
Related Articles
Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets
Dev.to
Mercor competitor Deccan AI raises $25M, sources experts from India
Dev.to

I asked my AI agent to design a product launch image. Here's what came back.
Dev.to
They Did Not Accidentally Make Work the Answer to Who You Are
Dev.to
Welsh government used Copilot for review to justify closing organization
The Register