ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment

arXiv cs.RO / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

ABot-PhysWorld is a 14B diffusion transformer video world model aimed at generating physically plausible, visually realistic, and action-controllable robot manipulation videos rather than likelihood-only, physically inconsistent outputs.
The model is trained on a curated dataset of 3 million manipulation clips with physics-aware annotations and uses a DPO-based post-training approach with decoupled discriminators to suppress unphysical behaviors while keeping visual quality.
It includes a parallel context block that supports precise spatial action injection to enable cross-embodiment control.
The authors introduce EZSbench, a training-independent embodied zero-shot benchmark that separates evaluation of physical realism from action alignment using a decoupled protocol, covering both real and synthetic unseen task-scene combinations.
ABot-PhysWorld reports new state-of-the-art results on PBench and EZSbench, claiming improvements over Veo 3.1 and Sora v2 Pro for physical plausibility and trajectory consistency, and plans to release EZSbench for standardized evaluation.

Abstract

Video-based world models offer a powerful paradigm for embodied simulation and planning, yet state-of-the-art models often generate physically implausible manipulations - such as object penetration and anti-gravity motion - due to training on generic visual data and likelihood-based objectives that ignore physical laws. We present ABot-PhysWorld, a 14B Diffusion Transformer model that generates visually realistic, physically plausible, and action-controllable videos. Built on a curated dataset of three million manipulation clips with physics-aware annotation, it uses a novel DPO-based post-training framework with decoupled discriminators to suppress unphysical behaviors while preserving visual quality. A parallel context block enables precise spatial action injection for cross-embodiment control. To better evaluate generalization, we introduce EZSbench, the first training-independent embodied zero-shot benchmark combining real and synthetic unseen robot-task-scene combinations. It employs a decoupled protocol to separately assess physical realism and action alignment. ABot-PhysWorld achieves new state-of-the-art performance on PBench and EZSbench, surpassing Veo 3.1 and Sora v2 Pro in physical plausibility and trajectory consistency. We will release EZSbench to promote standardized evaluation in embodied video generation.

Lemonade 10.0.1 improves setup process for using AMD Ryzen AI NPUs on Linux

Reddit r/artificial

The 2026 Developer Showdown: Claude Code vs. Google Antigravity

Dev.to

Google March 2026 Spam Update: SEO Impact and What to Do Now | MKDM

Dev.to

CRM Development That Drives Growth

Dev.to

Karpathy's Autoresearch: Improving Agentic Coding Skills

Dev.to

ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment

Key Points

Abstract

Related Articles

Lemonade 10.0.1 improves setup process for using AMD Ryzen AI NPUs on Linux

The 2026 Developer Showdown: Claude Code vs. Google Antigravity

Google March 2026 Spam Update: SEO Impact and What to Do Now | MKDM

CRM Development That Drives Growth

Karpathy's Autoresearch: Improving Agentic Coding Skills

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer