Dense Dynamic Scene Reconstruction and Camera Pose Estimation from Multi-View Videos
arXiv cs.CV / 3/13/2026
📰 NewsModels & Research
Key Points
- The paper addresses dense dynamic scene reconstruction and camera pose estimation from multiple freely moving cameras by proposing a two-stage optimization framework that decouples robust camera tracking from dense depth refinement.
- In stage one, it extends single-camera visual SLAM to multi-camera setups by building a spatiotemporal connection graph that leverages intra-camera temporal continuity and inter-camera spatial overlap, plus a wide-baseline initialization strategy using feed-forward reconstruction models for robustness with limited overlap.
- In stage two, depth and camera poses are refined by enforcing dense inter- and intra-camera consistency through wide-baseline optical flow.
- The work introduces MultiCamRobolab, a real-world dataset with ground-truth poses from a motion capture system.
- Experiments show the method significantly outperforms state-of-the-art feed-forward models on synthetic and real-world benchmarks and uses less memory.
Related Articles

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI
TechCrunch
[R] Weekly digest: arXiv AI security papers translated for practitioners -- Cascade (cross-stack CVE+Rowhammer attacks on compound AI), LAMLAD (dual-LLM adversarial ML, 97% evasion), OpenClaw (4 vuln classes in agent frameworks)
Reddit r/MachineLearning
My Experience with Qwen 3.5 35B
Reddit r/LocalLLaMA

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4
VentureBeat
Qwen 3.5 122B completely falls apart at ~ 100K context
Reddit r/LocalLLaMA