SymphoMotion: Joint Control of Camera Motion and Object Dynamics for Coherent Video Generation

arXiv cs.CV / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • SymphoMotion is introduced as a unified motion-control framework that jointly manages camera trajectories and object dynamics for more coherent video generation.
  • The approach uses a Camera Trajectory Control module to stabilize viewpoint transitions with explicit camera paths and geometry-aware cues, avoiding ambiguity between parallax and object motion.
  • An Object Dynamics Control module combines 2D visual guidance with 3D trajectory embeddings to support depth-aware, spatially consistent object manipulation.
  • The paper also releases RealCOD-25K, a new real-world dataset providing paired camera poses and object-level 3D trajectories across varied indoor and outdoor scenes to address a data gap.
  • Experiments and user studies report improved performance over existing methods in visual fidelity, camera controllability, and object-motion accuracy, and the project claims a new benchmark for unified motion control.

Abstract

Controlling both camera motion and object dynamics is essential for coherent and expressive video generation, yet current methods typically handle only one motion type or rely on ambiguous 2D cues that entangle camera-induced parallax with true object movement. We present SymphoMotion, a unified motion-control framework that jointly governs camera trajectories and object dynamics within a single model. SymphoMotion features a Camera Trajectory Control mechanism that integrates explicit camera paths with geometry-aware cues to ensure stable, structurally consistent viewpoint transitions, and an Object Dynamics Control mechanism that combines 2D visual guidance with 3D trajectory embeddings to enable depth-aware, spatially coherent object manipulation. To support large-scale training and evaluation, we further construct RealCOD-25K, a comprehensive real-world dataset containing paired camera poses and object-level 3D trajectories across diverse indoor and outdoor scenes, addressing a key data gap in unified motion control. Extensive experiments and user studies show that SymphoMotion significantly outperforms existing methods in visual fidelity, camera controllability, and object-motion accuracy, establishing a new benchmark for unified motion control in video generation.Codes and data are publicly available at https://grenoble-zhang.github.io/SymphoMotion/.