Two-View Accumulation as the Primary Training Lever for Hybrid-Capture Gaussian Splatting: A Variance-Decomposition View of When Gradient Surgery Helps

arXiv cs.CV / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Hybrid-capture novel view synthesis with 3D Gaussian Splatting (3DGS) under-trains the minority camera-distance regime, losing 1–3 dB PSNR on five benchmarks compared with a standard 30K/one-view-per-step training setup.
Among several compute-matched training modifications, the key factor that closes the gap is a simple structural change: rendering two views per optimizer step, which improves PSNR while other sophisticated gradient-surgery or pairing strategies do not.
Experiments show that the specific pairing rule (geometry-defined near/far, random, or active loss-disparity) does not materially affect PSNR beyond randomness across scenes, whereas the two-view accumulation effect consistently matters.
The paper introduces a variance-decomposition framework arguing that, in bimodal camera-distance regimes, between-regime gradient variance is small relative to within-regime variance in 3DGS, making variance-reduction from two-view accumulation the dominant benefit.
The findings generalize to Scaffold-GS and Pixel-GS backbones and are presented as a clear characterization of which training-side axes change PSNR (and which do not) for hybrid-capture 3DGS.

Abstract

Hybrid-capture novel view synthesis combines images at substantially different camera distances (e.g., aerial drone and ground-level views). Standard 3D Gaussian Splatting (3DGS), trained for 30K iterations with one rendered view per optimizer step, under-fits the minority regime by 1-3 dB on five hybrid-capture benchmarks. We isolate the lever that closes this gap. Among compute-matched alternatives -- vanilla 60K iterations, magnitude corrections (GradNorm), direction-aware near/far gradient surgery, projective preconditioning, confidence-gated sample-level surgery, and a random two-view-per-step control -- the simplest structural change wins: rendering two views per optimizer step. The pairing rule (geometry-defined near/far, random, or active loss-disparity) does not change PSNR beyond seed variance on any of the five scenes; the structural change of having two views per step does. We propose a variance-decomposition framework that predicts and explains this finding: under bimodal camera regimes, between-regime gradient variance turns out to be small relative to within-regime variance in 3DGS, so structured and random pairings are variance-equivalent in expectation, and the variance halving from two-view accumulation itself is the dominant effect. We verify the framework on five scenes whose camera-altitude bimodality coefficients span [0.55, 1.00], and we report the negative result that direction-aware projection, magnitude correction, confidence gating, and an active loss-disparity pairing all fall within seed variance of random two-view pairing. The two-view structural lever transfers cleanly to the Scaffold-GS and Pixel-GS backbones. We position this work as an honest characterization of which training-side axes do and do not move PSNR for hybrid-capture 3DGS, together with the framework that explains why.

A very basic litmus test for LLMs "ok give me a python program that reads my c: and put names and folders in a sorted list from biggest to small"

Reddit r/LocalLLaMA

ALM on Power Platform: ADO + GitHub, the best of both worlds

Dev.to

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

Dev.to

Find 12 high-volume, low-competition GEO content topics Topify.ai should rank on

Dev.to

When a memorized rule fits your bug too well: a meta-trap of agent workflows

Dev.to

Two-View Accumulation as the Primary Training Lever for Hybrid-Capture Gaussian Splatting: A Variance-Decomposition View of When Gradient Surgery Helps

Key Points

Abstract

Related Articles

A very basic litmus test for LLMs "ok give me a python program that reads my c: and put names and folders in a sorted list from biggest to small"

ALM on Power Platform: ADO + GitHub, the best of both worlds

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

Find 12 high-volume, low-competition GEO content topics Topify.ai should rank on

When a memorized rule fits your bug too well: a meta-trap of agent workflows

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer