ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment
arXiv cs.RO / 3/25/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- ABot-PhysWorld is a 14B diffusion transformer video world model aimed at generating physically plausible, visually realistic, and action-controllable robot manipulation videos rather than likelihood-only, physically inconsistent outputs.
- The model is trained on a curated dataset of 3 million manipulation clips with physics-aware annotations and uses a DPO-based post-training approach with decoupled discriminators to suppress unphysical behaviors while keeping visual quality.
- It includes a parallel context block that supports precise spatial action injection to enable cross-embodiment control.
- The authors introduce EZSbench, a training-independent embodied zero-shot benchmark that separates evaluation of physical realism from action alignment using a decoupled protocol, covering both real and synthetic unseen task-scene combinations.
- ABot-PhysWorld reports new state-of-the-art results on PBench and EZSbench, claiming improvements over Veo 3.1 and Sora v2 Pro for physical plausibility and trajectory consistency, and plans to release EZSbench for standardized evaluation.
Related Articles

Lemonade 10.0.1 improves setup process for using AMD Ryzen AI NPUs on Linux
Reddit r/artificial
The 2026 Developer Showdown: Claude Code vs. Google Antigravity
Dev.to

Google March 2026 Spam Update: SEO Impact and What to Do Now | MKDM
Dev.to
CRM Development That Drives Growth
Dev.to

Karpathy's Autoresearch: Improving Agentic Coding Skills
Dev.to