GESS: Multi-cue Guided Local Feature Learning via Geometric and Semantic Synergy

arXiv cs.CV / 4/8/2026

📰 NewsModels & Research

共有:

Key Points

The paper introduces GESS, a multi-cue guided framework for improving computer-vision local feature detection and description by jointly leveraging semantic and geometric cues.
It uses two lightweight prediction heads—one for semantic-normal coupling via a shared 3D vector field and another for depth stability via geometric consistency—to reduce optimization interference and improve keypoint reliability.
A Semantic-Depth Aware Keypoint (SDAK) mechanism reweights keypoint responses using semantic reliability and depth stability to suppress spurious features in unreliable regions.
For descriptors, it proposes a Unified Triple-Cue Fusion (UTCF) module with a semantic-scheduled gating strategy to adaptively inject multi-attribute information and enhance discriminability.
Experiments across four benchmarks report improved robustness and descriptor quality, and the authors indicate code and pretrained models will be released on GitHub.

Abstract

Robust local feature detection and description are foundational tasks in computer vision. Existing methods primarily rely on single appearance cues for modeling, leading to unstable keypoints and insufficient descriptor discriminability. In this paper, we propose a multi-cue guided local feature learning framework that leverages semantic and geometric cues to synergistically enhance detection robustness and descriptor discriminability. Specifically, we construct a joint semantic-normal prediction head and a depth stability prediction head atop a lightweight backbone. The former leverages a shared 3D vector field to deeply couple semantic and normal cues, thereby resolving optimization interference from heterogeneous inconsistencies. The latter quantifies the reliability of local regions from a geometric consistency perspective, providing deterministic guidance for robust keypoint selection. Based on these predictions, we introduce the Semantic-Depth Aware Keypoint (SDAK) mechanism for feature detection. By coupling semantic reliability with depth stability, SDAK reweights keypoint responses to suppress spurious features in unreliable regions. For descriptor construction, we design a Unified Triple-Cue Fusion (UTCF) module, which employs a semantic-scheduled gating mechanism to adaptively inject multi-attribute features, improving descriptor discriminability. Extensive experiments on four benchmarks validate the effectiveness of the proposed framework. The source code and pre-trained model will be available at: https://github.com/yiyscut/GESS.git.

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

Reddit r/MachineLearning

Context Windows Are Getting Absurd — And That's a Good Thing

Dev.to

GitHub Weekly: Copilot SDK Goes Public, Cloud Agent Breaks Free

Dev.to

kepler-452b. GGUF when?

Reddit r/LocalLLaMA

[Tool] Quick hack to recover Qwen3.5 MTP after fine-tuning for faster inference speed (Transformers)

Reddit r/LocalLLaMA

GESS: Multi-cue Guided Local Feature Learning via Geometric and Semantic Synergy

Key Points

Abstract

Related Articles

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

Context Windows Are Getting Absurd — And That's a Good Thing

GitHub Weekly: Copilot SDK Goes Public, Cloud Agent Breaks Free

kepler-452b. GGUF when?

[Tool] Quick hack to recover Qwen3.5 MTP after fine-tuning for faster inference speed (Transformers)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer