MotionScape: A Large-Scale Real-World Highly Dynamic UAV Video Dataset for World Models
arXiv cs.CV / 4/10/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- MotionScape is a new large-scale real-world UAV-view video dataset created to improve world models’ ability to predict complex, highly dynamic 3D dynamics under fast, unconstrained 6-DoF camera motion.
- The dataset includes 30+ hours of 4K videos (4.5M+ frames) with semantically and geometrically aligned samples, pairing each video with accurate 6-DoF camera trajectories and fine-grained natural-language descriptions.
- Its construction uses an automated multi-stage pipeline combining CLIP-based relevance filtering, temporal segmentation, robust visual SLAM for trajectory recovery, and LLM-driven semantic annotation.
- Experiments reported in the paper indicate that the aligned semantic/geometric annotations improve existing world models’ simulation quality for complex 3D dynamics and large viewpoint shifts, supporting better UAV planning and decision-making.
- MotionScape is publicly available via the provided GitHub link, enabling researchers to train and evaluate UAV-oriented world models with realistic motion priors.



