Cross-Vehicle 3D Geometric Consistency for Self-Supervised Surround Depth Estimation on Articulated Vehicles

arXiv cs.AI / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes ArticuSurDepth, a self-supervised multi-camera framework for surround-view depth estimation specifically targeting articulated vehicles that are difficult for existing passenger-vehicle-centric methods.
It improves depth learning by enforcing cross-view geometric consistency via multi-view spatial context enrichment, a cross-view surface normal constraint, and cross-vehicle pose consistency to handle coupled motions across articulated segments.
To encourage metric depth, the method adds camera height regularization grounded in ground-plane awareness, aiming to better align predicted depth scales with real-world geometry.
The authors validate the approach on a newly built articulated-vehicle experiment platform with a self-collected dataset, and report state-of-the-art performance on both their dataset and established benchmarks including DDAD, nuScenes, and KITTI.
The framework is guided by structural priors derived from a vision foundation model to enhance structural coherence across spatial and temporal contexts.

Abstract

Surround depth estimation provides a cost-effective alternative to LiDAR for 3D perception in autonomous driving. While recent self-supervised methods explore multi-camera settings to improve scale awareness and scene coverage, they are primarily designed for passenger vehicles and rarely consider articulated vehicles or robotics platforms. The articulated structure introduces complex cross-segment geometry and motion coupling, making consistent depth reasoning across views more challenging. In this work, we propose \textbf{ArticuSurDepth}, a self-supervised framework for surround-view depth estimation on articulated vehicles that enhances depth learning through cross-view and cross-vehicle geometric consistency guided by structural priors from vision foundation model. Specifically, we introduce multi-view spatial context enrichment strategy and a cross-view surface normal constraint to improve structural coherence across spatial and temporal contexts. We further incorporate camera height regularization with ground plane-awareness to encourage metric depth estimation, together with cross-vehicle pose consistency that bridges motion estimation between articulated segments. To validate our proposed method, an articulated vehicle experiment platform was established with a dataset collected over it. Experiment results demonstrate state-of-the-art (SoTA) performance of depth estimation on our self-collected dataset as well as on DDAD, nuScenes, and KITTI benchmarks.

Black Hat Asia

AI Business

How Bash Command Safety Analysis Works in AI Systems

Dev.to

How I Built an AI Agent That Earns USDC While I Sleep — A Complete Guide

Dev.to

How to Get Better Output from AI Tools (Without Burning Time and Tokens)

Dev.to

How I Added LangChain4j Without Letting It Take Over My Spring Boot App

Dev.to

Cross-Vehicle 3D Geometric Consistency for Self-Supervised Surround Depth Estimation on Articulated Vehicles

Key Points

Abstract

Related Articles

Black Hat Asia

How Bash Command Safety Analysis Works in AI Systems

How I Built an AI Agent That Earns USDC While I Sleep — A Complete Guide

How to Get Better Output from AI Tools (Without Burning Time and Tokens)

How I Added LangChain4j Without Letting It Take Over My Spring Boot App

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer