GigaWorld-Policy: An Efficient Action-Centered World--Action Model
arXiv cs.CV / 3/19/2026
💬 OpinionModels & Research
Key Points
- GigaWorld-Policy introduces an action-centered World-Action Model (WAM) that learns 2D pixel-action dynamics with optional video generation to accelerate robot policy learning.
- Policy training is split into predicting future action sequences conditioned on the current observation and generating future videos conditioned on those actions, with both signals supervised to encourage physically plausible actions through visual-dynamics constraints.
- A causal design ensures future video tokens do not influence action tokens, enabling faster action inference when future-video generation is disabled at deployment.
- Experimental results on real-world robotic platforms show 9x faster inference than Motus and a 7% improvement in task success, plus a 95% improvement over pi-0.5 on RoboTwin 2.0.
Related Articles
Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI
TechCrunch
[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln classes in agent frameworks)
Reddit r/MachineLearning
My Experience with Qwen 3.5 35B
Reddit r/LocalLLaMA

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4
VentureBeat
Qwen 3.5 122B completely falls apart at ~ 100K context
Reddit r/LocalLLaMA