MV-SAM3D: Adaptive Multi-View Fusion for Layout-Aware 3D Generation

arXiv cs.CV / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

MV-SAM3D extends layout-aware 3D generation to multi-view inputs by formulating a Multi-Diffusion process in 3D latent space, enabling more accurate and consistent scene reconstructions.
It introduces two adaptive weighting strategies, attention-entropy weighting and visibility weighting, to perform confidence-aware fusion across viewpoints based on local observation reliability.
The framework incorporates physics-aware optimization to enforce collision and contact constraints during and after generation, resulting in physically plausible multi-object layouts.
Importantly, MV-SAM3D is training-free and demonstrates significant improvements in reconstruction fidelity and layout plausibility on benchmarks and real-world scenes, with code available on GitHub.

Abstract

Recent unified 3D generation models have made remarkable progress in producing high-quality 3D assets from a single image. Notably, layout-aware approaches such as SAM3D can reconstruct multiple objects while preserving their spatial arrangement, opening the door to practical scene-level 3D generation. However, current methods are limited to single-view input and cannot leverage complementary multi-view observations, while independently estimated object poses often lead to physically implausible layouts such as interpenetration and floating artifacts. We present MV-SAM3D, a training-free framework that extends layout-aware 3D generation with multi-view consistency and physical plausibility. We formulate multi-view fusion as a Multi-Diffusion process in 3D latent space and propose two adaptive weighting strategies -- attention-entropy weighting and visibility weighting -- that enable confidence-aware fusion, ensuring each viewpoint contributes according to its local observation reliability. For multi-object composition, we introduce physics-aware optimization that injects collision and contact constraints both during and after generation, yielding physically plausible object arrangements. Experiments on standard benchmarks and real-world multi-object scenes demonstrate significant improvements in reconstruction fidelity and layout plausibility, all without any additional training. Code is available at https://github.com/devinli123/MV-SAM3D.

How to Build an AI Team: The Solopreneur Playbook

Dev.to

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026

Dev.to

Experiment: How far can a 28M model go in business email generation?

Reddit r/LocalLLaMA

Activation Exposure & Feature Interpretability for GGUF via llama-server

Reddit r/LocalLLaMA

『モンドーモンドー』｜夏目龍頭流闇文学｜AI画像生成｜自由詩｜散文詩｜ホラー｜ダークファンタジー｜深淵図書館

note

MV-SAM3D: Adaptive Multi-View Fusion for Layout-Aware 3D Generation

Key Points

Abstract

Related Articles

How to Build an AI Team: The Solopreneur Playbook

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026

Experiment: How far can a 28M model go in business email generation?

Activation Exposure & Feature Interpretability for GGUF via llama-server

『モンドーモンドー』｜夏目龍頭流闇文学｜AI画像生成｜自由詩｜散文詩｜ホラー｜ダークファンタジー｜深淵図書館

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer