Test-Time Adaptation for Height Completion via Self-Supervised ViT Features and Monocular Foundation Models

arXiv cs.CV / 4/3/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Prior2DSMは、欠損・更新遅れのあるデジタル表面モデル(DSM)を「学習不要(training-free)」で高さ補完するためのテスト時適応フレームワークである。
  • DINOv3の自己教師ありViT特徴と、単眼の深度ファウンデーションモデルを組み合わせて、欠損領域へ相対深度から対応付けを通じてメートル系の情報を伝播する。
  • テスト時適応(TTA)ではLoRAと小型MLPにより、相対深度推定をメートル高さへ変換するための空間的に変化するスケール/シフトを推定する。
  • 実験では補間法や従来の事前(prior)に基づく再スケーリング手法、既存の単眼深度推定モデルを上回り、線形フィッティング比でRMSEを最大46%削減し、構造の忠実性も維持する。
  • 本手法はDSM更新やRGB-DSM生成にも拡張可能で、地理空間アプリでの欠損復元や更新ワークフローに適用できる可能性を示している。

Abstract

Accurate digital surface models (DSMs) are essential for many geospatial applications, including urban monitoring, environmental analyses, infrastructure management, and change detection. However, large-scale DSMs frequently contain incomplete or outdated regions due to acquisition limitations, reconstruction artifacts, or changes in the built environment. Traditional height completion approaches primarily rely on spatial interpolation or which assume spatial continuity and therefore fail when objects are missing. Recent learning-based approaches improve reconstruction quality but typically require supervised training on sensor-specific datasets, limiting their generalization across domains and sensing conditions. We propose Prior2DSM, a training-free framework for metric DSM completion that operates entirely at test time by leveraging foundation models. Unlike previous height completion approaches that require task-specific training, the proposed method combines self-supervised Vision Transformer (ViT) features from DINOv3 with monocular depth foundation models to propagate metric information from incomplete height priors through semantic feature-space correspondence. Test-time adaptation (TTA) is performed using parameter-efficient low-rank adaptation (LoRA) together with a lightweight multilayer perceptron (MLP), which predicts spatially varying scale and shift parameters to convert relative depth estimates into metric heights. Experiments demonstrate consistent improvements over interpolation based methods, prior-based rescaling height approaches, and state-of-the-art monocular depth estimation models. Prior2DSM reduces reconstruction error while preserving structural fidelity, achieving up to a 46% reduction in RMSE compared to linear fitting of MDE, and further enables DSM updating and coupled RGB-DSM generation.