Reliev3R: Relieving Feed-forward Reconstruction from Multi-View Geometric Annotations

arXiv cs.CV / 4/2/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces Reliev3R, a weakly-supervised training framework for feed-forward reconstruction models that reduces dependence on expensive multi-view geometric annotations like 3D point maps and camera poses.
Instead of relying on costly structure-from-motion preprocessing, it leverages monocular relative depths and sparse image correspondences derived from zero-shot predictions from pretrained models to obtain 3D knowledge.
Reliev3R proposes an ambiguity-aware relative depth loss and a trigonometry-based reprojection loss to enforce multi-view geometric consistency during training.
Experiments claim that training from scratch with less data allows Reliev3R to reach performance comparable to fully-supervised FFRMs, aiming to make 3D reconstruction supervision more scalable and lower-cost.

Abstract

With recent advances, Feed-forward Reconstruction Models (FFRMs) have demonstrated great potential in reconstruction quality and adaptiveness to multiple downstream tasks. However, the excessive reliance on multi-view geometric annotations, e.g. 3D point maps and camera poses, makes the fully-supervised training scheme of FFRMs difficult to scale up. In this paper, we propose Reliev3R, a weakly-supervised paradigm for training FFRMs from scratch without cost-prohibitive multi-view geometric annotations. Relieving the reliance on geometric sensory data and compute-exhaustive structure-from-motion preprocessing, our method draws 3D knowledge directly from monocular relative depths and image sparse correspondences given by zero-shot predictions of pretrained models. At the core of Reliev3R, we design an ambiguity-aware relative depth loss and a trigonometry-based reprojection loss to facilitate supervision for multi-view geometric consistency. Training from scratch with the less data, Reliev3R catches up with its fully-supervised sibling models, taking a step towards low-cost 3D reconstruction supervisions and scalable FFRMs.

v5.5.0

Transformers（HuggingFace）Releases

Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke

Reddit r/LocalLLaMA

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Inference Engines - A visual deep dive into the layers of an LLM

Dev.to

Surprised by how capable Qwen3.5 9B is in agentic flows (CodeMode)

Reddit r/LocalLLaMA

Reliev3R: Relieving Feed-forward Reconstruction from Multi-View Geometric Annotations

Key Points

Abstract

Related Articles

v5.5.0

Bonsai (PrismML's 1 bit version of Qwen3 8B 4B 1.7B) was not an aprils fools joke

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Inference Engines - A visual deep dive into the layers of an LLM

Surprised by how capable Qwen3.5 9B is in agentic flows (CodeMode)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer