Last-Layer-Centric Feature Recombination: Unleashing 3D Geometric Knowledge in DINOv3 for Monocular Depth Estimation
arXiv cs.CV / 4/30/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that monocular depth estimation (MDE) benefits from vision foundation models, but existing DINO-based methods may waste 3D cues by sampling transformer layers uniformly.
- A layer-wise analysis of DINOv3 finds that geometric/depth information is distributed non-uniformly across layers, with deeper layers carrying stronger depth predictability and capturing more inter-sample geometric variation.
- To exploit this, the authors propose a Last-Layer-Centric Feature Recombination (LFR) module that treats the final transformer layer as a geometric anchor and adaptively selects complementary intermediate layers using a minimal-similarity criterion.
- The selected intermediate features are fused with the last-layer representation through compact linear adapters, improving geometric expressiveness for dense prediction.
- Experiments show consistent gains for MDE accuracy and report state-of-the-art performance, alongside insights into where 3D knowledge resides inside VFMs.
Related Articles
Building a Local AI Agent (Part 2): Six UX and UI Design Challenges
Dev.to
We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works
Dev.to
Your first business opportunity in 3 commands: /register_directory in @biznode_bot, wait for matches, then /my_pulse to view...
Dev.to
Building AI Evaluation Pipelines: Automating LLM Testing from Dataset to CI/CD
Dev.to

Function Calling Harness 2: CoT Compliance from 9.91% to 100%
Dev.to