ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction

arXiv cs.CV / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

ProDiGは、空撮（aerial）画像のみから地上（ground-level）の視点と整合的な3Dサイトモデルを生成する課題に対し、広い視点ギャップでも幾何学的に破綻しにくい進行的な復元手法を提案しています。
従来の“後処理でのレンダリング改良”や“複数高度の地上正解”に依存せず、ProDiGは中間高度の表現を合成しながら段階的にGaussian表現を拡散モデルで洗練します。
幾何構造（エピポラ構造）を参照ビューの拡散推論へ注入するgeometry-aware causal attentionモジュールと、カメラ距離に応じてGaussianのスケール/不透明度を動的調整するdistance-adaptiveモジュールにより、広い距離変化でも安定した再構成を実現します。
合成データと実データの実験で、見た目の自然さ、3D幾何の整合性、極端な視点変化への頑健性の面で既存手法を大きく上回ると報告されています。

Abstract

Generating ground-level views and coherent 3D site models from aerial-only imagery is challenging due to extreme viewpoint changes, missing intermediate observations, and large scale variations. Existing methods either refine renderings post-hoc, often producing geometrically inconsistent results, or rely on multi-altitude ground-truth, which is rarely available. Gaussian Splatting and diffusion-based refinements improve fidelity under small variations but fail under wide aerial-to-ground gaps. To address these limitations, we introduce ProDiG (Progressive Altitude Gaussian Splatting), a diffusion-guided framework that progressively transforms aerial 3D representations toward ground-level fidelity. ProDiG synthesizes intermediate-altitude views and refines the Gaussian representation at each stage using a geometry-aware causal attention module that injects epipolar structure into reference-view diffusion. A distance-adaptive Gaussian module dynamically adjusts Gaussian scale and opacity based on camera distance, ensuring stable reconstruction across large viewpoint gaps. Together, these components enable progressive, geometrically grounded refinement without requiring additional ground-truth viewpoints. Extensive experiments on synthetic and real-world datasets demonstrate that ProDiG produces visually realistic ground-level renderings and coherent 3D geometry, significantly outperforming existing approaches in terms of visual quality, geometric consistency, and robustness to extreme viewpoint changes.

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

MarkTechPost

The house asked me a question

Dev.to

Precision Clip Selection: How AI Suggests Your In and Out Points

Dev.to

ProDiG: Progressive Diffusion-Guided Gaussian Splatting for Aerial to Ground Reconstruction

Key Points

Abstract

Related Articles

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

The house asked me a question

Precision Clip Selection: How AI Suggests Your In and Out Points

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer