Motion Forcing: A Decoupled Framework for Robust Video Generation in Motion Dynamics
arXiv cs.CV / 3/12/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies a trilemma in video generation—high visual quality, physical consistency, and controllability—that degrades in complex scenes like collisions or dense traffic.
- It introduces Motion Forcing, a decoupled framework that separates physical reasoning from visual synthesis using a hierarchical Point-Shape-Appearance paradigm.
- It proposes Masked Point Recovery, a training strategy that masks input anchors and requires the model to reconstruct complete dynamic depth, encouraging learning of latent physical laws such as inertia.
- Extensive experiments on autonomous driving benchmarks and physics/robotics tasks show that Motion Forcing outperforms state-of-the-art baselines and maintains trilemma stability in challenging scenarios.




