DINO_4D: Semantic-Aware 4D Reconstruction

arXiv cs.CV / 4/14/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

DINO_4Dは、動的シーンの4D再構成に「意味（セマンティクス）を意識した」追跡・再構成を導入し、動的トラッキング中に起きるセマンティクスのドリフトを抑えることを狙った手法です。
凍結したDINOv3の特徴を構造的プライア（priors）として用い、低レベルの幾何学的な手がかりから高レベルの意味理解へつなぐ設計になっています。
Point OdysseyおよびTUM-Dynamicsのベンチマークで評価し、従来同様に時間計算量を線形のO(T)に保ちながら、Tracking Accuracy（APD）とReconstruction Completenessを大きく改善したと報告しています。
本手法は、幾何学的精度と意味理解の両立を目指す「セマンティック対応の4D World Models」の新しいパラダイムを提示すると位置付けられています。

Abstract

In the intersection of computer vision and robotic perception, 4D reconstruction of dynamic scenes serve as the critical bridge connecting low-level geometric sensing with high-level semantic understanding. We present DINO\_4D, introducing frozen DINOv3 features as structural priors, injecting semantic awareness into the reconstruction process to effectively suppress semantic drift during dynamic tracking. Experiments on the Point Odyssey and TUM-Dynamics benchmarks demonstrate that our method maintains the linear time complexity

O(T)

of its predecessors while significantly improving Tracking Accuracy (APD) and Reconstruction Completeness. DINO\_4D establishes a new paradigm for constructing 4D World Models that possess both geometric precision and semantic understanding.