PTC-Depth: Pose-Refined Monocular Depth Estimation with Temporal Consistency

arXiv cs.CV / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses a key limitation in monocular depth estimation: lack of temporal consistency across consecutive video frames, which can cause depth jitter and failures during abrupt depth-range changes.
It proposes PTC-Depth, a consistency-aware framework that uses wheel odometry to stabilize camera pose and employs optical flow to compute triangulation-based sparse depth.
Sparse depth updates a recursive Bayesian estimate of metric scale over time, which is then used to rescale the relative depth output from a pre-trained depth “foundation” model.
Experiments on KITTI, TartanAir, MS2, and an additional dataset show improved robustness and accuracy for metric depth estimation under temporal dynamics.

Abstract

Monocular depth estimation (MDE) has been widely adopted in the perception systems of autonomous vehicles and mobile robots. However, existing approaches often struggle to maintain temporal consistency in depth estimation across consecutive frames. This inconsistency not only causes jitter but can also lead to estimation failures when the depth range changes abruptly. To address these challenges, this paper proposes a consistency-aware monocular depth estimation framework that leverages wheel odometry from a mobile robot to achieve stable and coherent depth predictions over time. Specifically, we estimate camera pose and sparse depth from triangulation using optical flow between consecutive frames. The sparse depth estimates are used to update a recursive Bayesian estimate of the metric scale, which is then applied to rescale the relative depth predicted by a pre-trained depth estimation foundation model. The proposed method is evaluated on the KITTI, TartanAir, MS2, and our own dataset, demonstrating robust and accurate depth estimation performance.

Why I built an AI assistant that doesn't know who you are

Dev.to

DenseNet Paper Walkthrough: All Connected

Towards Data Science

Meta Adaptive Ranking Model: What Instagram Advertisers Gain in 2026 | MKDM

Dev.to

The Facebook insider building content moderation for the AI era

TechCrunch

Qwen3.5 vs Gemma 4: Benchmarks vs real world use?

Reddit r/LocalLLaMA

PTC-Depth: Pose-Refined Monocular Depth Estimation with Temporal Consistency

Key Points

Abstract

Related Articles

Why I built an AI assistant that doesn't know who you are

DenseNet Paper Walkthrough: All Connected

Meta Adaptive Ranking Model: What Instagram Advertisers Gain in 2026 | MKDM

The Facebook insider building content moderation for the AI era

Qwen3.5 vs Gemma 4: Benchmarks vs real world use?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer