AI Navigate

Measuring 3D Spatial Geometric Consistency in Dynamic Generated Videos

arXiv cs.CV / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper shows that existing metrics like FVD fail to capture 3D geometric distortions in dynamically generated videos.
  • It introduces SGC, a metric that measures 3D Spatial Geometric Consistency by comparing local camera pose estimates across different static sub-regions.
  • The method separates dynamic from static regions, partitions the static background into coherent sub-regions, predicts depth per pixel, and computes divergence among local poses to quantify inconsistencies.
  • Experiments with real and generated videos demonstrate that SGC robustly detects geometric failures that prior metrics miss.

Abstract

Recent generative models can produce high-fidelity videos, yet they often exhibit 3D spatial geometric inconsistencies. Existing evaluation methods fail to accurately characterize these inconsistencies: fidelity-centric metrics like FVD are insensitive to geometric distortions, while consistency-focused benchmarks often penalize valid foreground dynamics. To address this gap, we introduce SGC, a metric for evaluating 3D \textbf{S}patial \textbf{G}eometric \textbf{C}onsistency in dynamically generated videos. We quantify geometric consistency by measuring the divergence among multiple camera poses estimated from distinct local regions. Our approach first separates static from dynamic regions, then partitions the static background into spatially coherent sub-regions. We predict depth for each pixel, estimate a local camera pose for each subregion, and compute the divergence among these poses to quantify geometric consistency. Experiments on real and generative videos demonstrate that SGC robustly quantifies geometric inconsistencies, effectively identifying critical failures missed by existing metrics.