SEAR: Simple and Efficient Adaptation of Visual Geometric Transformers for RGB+Thermal 3D Reconstruction
arXiv cs.CV / 3/20/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- SEAR is a simple fine-tuning strategy that adapts a pretrained visual geometric transformer to multimodal RGB-T inputs for improved 3D reconstruction and camera pose estimation.
- On a relatively small RGB-T dataset, SEAR significantly outperforms state-of-the-art methods, including a notable 29% gain in AUC@30.
- The approach delivers higher detail and better alignment between RGB and thermal modalities with negligible inference-time overhead compared to the original pretrained model, even under challenging conditions like low lighting and dense smoke.
- The authors introduce a new RGB-T dataset with sequences captured across varying times, viewpoints, and illumination to serve as a robust benchmark for multimodal 3D scene reconstruction.
- Code and pretrained models are publicly available on GitHub, facilitating replication and practical adoption.
Related Articles

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4
Dev.to
AI Text Analyzer vs Asking Friends: Which Gives Better Perspective?
Dev.to
[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly
Reddit r/MachineLearning

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team
THE DECODER
MolmoWeb 4B/8B
Reddit r/LocalLLaMA