AI Navigate

Semantic Aware Feature Extraction for Enhanced 3D Reconstruction

arXiv cs.CV / 3/17/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a semantic-aware feature extraction framework that jointly trains keypoint detection, keypoint description, and semantic segmentation through multi-task learning to improve feature matching.
  • It adds a deep matching module to strengthen correspondences and evaluates the method on data from a monocular fisheye camera mounted on a vehicle in a multi-floor parking structure, enabling semantic 3D reconstruction with elevation estimation.
  • The method produces semantically annotated 3D point clouds that reveal elevation changes and support multi-level mapping beyond purely geometric reconstruction.
  • Experimental results show improved structural detail and feature match consistency when semantic cues are integrated, highlighting potential gains for SLAM, image stitching, and 3D reconstruction workflows.

Abstract

Feature matching is a fundamental problem in computer vision with wide-ranging applications, including simultaneous localization and mapping (SLAM), image stitching, and 3D reconstruction. While recent advances in deep learning have improved keypoint detection and description, most approaches focus primarily on geometric attributes and often neglect higher-level semantic information. This work proposes a semantic-aware feature extraction framework that employs multi-task learning to jointly train keypoint detection, keypoint description, and semantic segmentation. The method is benchmarked against standard feature matching techniques and evaluated in the context of 3D reconstruction. To enhance feature correspondence, a deep matching module is integrated. The system is tested using input from a single monocular fisheye camera mounted on a vehicle and evaluated within a multi-floor parking structure. The proposed approach supports semantic 3D reconstruction with altitude estimation, capturing elevation changes and enabling multi-level mapping. Experimental results demonstrate that the method produces semantically annotated 3D point clouds with improved structural detail and elevation information, underscoring the effectiveness of joint training with semantic cues for more consistent feature matching and enhanced 3D reconstruction.