SpectralSplats: Robust Differentiable Tracking via Spectral Moment Supervision

arXiv cs.CV / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies a key failure mode in differentiable 3D Gaussian Splatting (3DGS) tracking: when camera misalignment removes spatial overlap, standard photometric losses produce vanishing gradients and optimization gets stuck.
  • SpectralSplats addresses this by moving supervision to the frequency domain using “Spectral Moments,” i.e., global complex sinusoidal features, to maintain a directional gradient across the full image even with no pixel overlap.
  • To avoid high-frequency periodic local minima, the method introduces a Frequency Annealing schedule that transitions optimization from a global basin toward accurate spatial alignment.
  • Experiments show SpectralSplats can serve as a drop-in replacement for spatial losses across multiple deformation parameterizations (e.g., MLPs and sparse control points), enabling recovery from severely misaligned initializations where baseline tracking fails.

Abstract

3D Gaussian Splatting (3DGS) enables real-time, photorealistic novel view synthesis, making it a highly attractive representation for model-based video tracking. However, leveraging the differentiability of the 3DGS renderer "in the wild" remains notoriously fragile. A fundamental bottleneck lies in the compact, local support of the Gaussian primitives. Standard photometric objectives implicitly rely on spatial overlap; if severe camera misalignment places the rendered object outside the target's local footprint, gradients strictly vanish, leaving the optimizer stranded. We introduce SpectralSplats, a robust tracking framework that resolves this "vanishing gradient" problem by shifting the optimization objective from the spatial to the frequency domain. By supervising the rendered image via a set of global complex sinusoidal features (Spectral Moments), we construct a global basin of attraction, ensuring that a valid, directional gradient toward the target exists across the entire image domain, even when pixel overlap is completely nonexistent. To harness this global basin without introducing periodic local minima associated with high frequencies, we derive a principled Frequency Annealing schedule from first principles, gracefully transitioning the optimizer from global convexity to precise spatial alignment. We demonstrate that SpectralSplats acts as a seamless, drop-in replacement for spatial losses across diverse deformation parameterizations (from MLPs to sparse control points), successfully recovering complex deformations even from severely misaligned initializations where standard appearance-based tracking catastrophically fails.