InCaRPose: In-Cabin Relative Camera Pose Estimation Model and Dataset

arXiv cs.CV / 4/7/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

InCaRPose introduces a Transformer-based model for robust relative camera pose estimation aimed at in-cabin automotive monitoring and camera extrinsic calibration under severe distortion (e.g., fisheye lenses).
The method uses frozen backbone features (DINOv3) with a Transformer decoder to infer geometric relationships between reference and target images in a single inference step, including absolute metric-scale translation within realistic mount-adjustment bounds.
To handle highly distorted automotive interiors, the approach is trained exclusively on synthetic data and is designed to generalize to real-world cabins without requiring identical camera intrinsics.
The paper reports competitive results on the public 7-Scenes dataset and maintains high rotation/translation precision even with a ViT-Small backbone, targeting real-time use cases such as driver monitoring in (supervised) autonomous driving.
Alongside the model, the authors release the In-Cabin-Pose dataset of highly distorted vehicle-interior images and provide code on GitHub.

Abstract

Camera extrinsic calibration is a fundamental task in computer vision. However, precise relative pose estimation in constrained, highly distorted environments, such as in-cabin automotive monitoring (ICAM), remains challenging. We present InCaRPose, a Transformer-based architecture designed for robust relative pose prediction between image pairs, which can be used for camera extrinsic calibration. By leveraging frozen backbone features such as DINOv3 and a Transformer-based decoder, our model effectively captures the geometric relationship between a reference and a target view. Unlike traditional methods, our approach achieves absolute metric-scale translation within the physically plausible adjustment range of in-cabin camera mounts in a single inference step, which is critical for ICAM, where accurate real-world distances are required for safety-relevant perception. We specifically address the challenges of highly distorted fisheye cameras in automotive interiors by training exclusively on synthetic data. Our model is capable of generalization to real-world cabin environments without relying on the exact same camera intrinsics and additionally achieves competitive performance on the public 7-Scenes dataset. Despite having limited training data, InCaRPose maintains high precision in both rotation and translation, even with a ViT-Small backbone. This enables real-time performance for time-critical inference, such as driver monitoring in supervised autonomous driving. We release our real-world In-Cabin-Pose test dataset consisting of highly distorted vehicle-interior images and our code at https://github.com/felixstillger/InCaRPose.

Black Hat Asia

AI Business

v0.20.5

Ollama Releases

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Dev.to

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

Reddit r/LocalLLaMA

SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System

Dev.to

InCaRPose: In-Cabin Relative Camera Pose Estimation Model and Dataset

Key Points

Abstract

Related Articles

Black Hat Asia

v0.20.5

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS

Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.

SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer