Incoherent Deformation, Not Capacity: Diagnosing and Mitigating Overfitting in Dynamic Gaussian Splatting

arXiv cs.CV / 4/21/2026

📰 NewsModels & Research

Key Points

  • Dynamic 3D Gaussian Splatting models achieve high PSNR on training views but generalize poorly, with an average D-NeRF train-test gap of 6.18 dB and up to 11 dB on individual scenes.
  • Systematic ablation of Adaptive Density Control shows that disabling splitting drastically reduces the Gaussian count (44K→3K) and largely eliminates overfitting, indicating that splitting/capacity plays a major role in the PSNR gap.
  • However, the paper finds that capacity alone is insufficient: adding an Elastic Energy Regularization (EER) that enforces deformation smoothness reduces the train-test PSNR gap by 40.8% while increasing the number of Gaussians.
  • Measuring deformation strain on checkpoints shows EER dramatically lowers strain (about 99.7% mean reduction), and in all scenes the deformation coherence achieved under EER outperforms even the best-behaved baseline Gaussians.
  • Additional regularizers (GAD and PTDrop) further reduce the gap (up to 57%), and the coherence-based mitigation transfers to alternative deformation architectures and to real monocular video with minimal quality cost.

Abstract

Dynamic 3D Gaussian Splatting methods achieve strong training-view PSNR on monocular video but generalize poorly: on the D-NeRF benchmark we measure an average train-test PSNR gap of 6.18 dB, rising to 11 dB on individual scenes. We report two findings that together account for most of that gap. Finding 1 (the role of splitting). A systematic ablation of the Adaptive Density Control pipeline (split, clone, prune, frequency, threshold, schedule) shows that splitting is responsible for over 80% of the gap: disabling split collapses the cloud from 44K to 3K Gaussians and the gap from 6.18 dB to 1.15 dB. Across all threshold-varying ablations, gap is log-linear in count (r = 0.995, bootstrap 95% CI [0.99, 1.00]), which suggests a capacity-based explanation. Finding 2 (the role of deformation coherence). We show that the capacity explanation is incomplete. A local-smoothness penalty on the per-Gaussian deformation field -- Elastic Energy Regularization (EER) -- reduces the gap by 40.8% while growing the cloud by 85%. Measuring per-Gaussian strain directly on trained checkpoints, EER reduces mean strain by 99.72% (median 99.80%) across all 8 scenes; on 8/8 scenes the median Gaussian under EER is less strained than the 1st-percentile (best-behaved) Gaussian under baseline. Alongside EER, we evaluate two further regularizers: GAD, a loss-rate-aware densification threshold, and PTDrop, a jitter-weighted Gaussian dropout. GAD+EER reduces the gap by 48%; adding PTDrop and a soft growth cap reaches 57%. We confirm that coherence generalizes to (a) a different deformation architecture (Deformable-3DGS, +40.6% gap reduction at re-tuned lambda), and (b) real monocular video (4 HyperNeRF scenes, reducing the mean PSNR gap by 14.9% at the same lambda as D-NeRF, with near-zero quality cost). The overfitting in dynamic 3DGS is driven by incoherent deformation, not parameter count.