COMPASS: COmpact Multi-channel Prior-map And Scene Signature for Floor-Plan-Based Visual Localization

arXiv cs.CV / 4/29/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces COMPASS, a visual localization algorithm that leverages both geometric and semantic priors from architectural floor plans rather than relying mainly on geometry.
COMPASS builds a multi-channel, radial “scan-context”-inspired descriptor using 360 azimuth bins and encodes five channels: normalized range, structural hit type (wall/window/opening), range gradient, inverse range, and local range variance.
On the vision side, the method populates the same descriptor structure by detecting structural elements in dual-fisheye images, enabling structural matching between the camera view and the floor-plan-derived descriptor.
As a first step toward full cross-modal matching, the authors propose a fisheye window detection algorithm based on line segment detection, vertical edge clustering, and brightness verification.
In a proof-of-concept using the Hilti-Trimble SLAM Challenge 2026 dataset, window/wall patterns from the first camera frames show close correspondence to the floor-plan descriptor, supporting the feasibility of cross-modal structural localization.

Abstract

Architectural floor plans are widely available priors which contain not only geometry but also the semantic information of the environment, yet existing localization methods largely ignore this semantic information. To address this, we present COMPASS, an algorithm that exploits both geometric and semantic priors from floor plans to estimate the pose of a robot equipped with dual fisheye cameras. Inspired by scan context descriptor from LiDAR-based place recognition, we design a multi-channel radial descriptor that encodes the geometric layout surrounding a position. From the floor plan, rays are cast in 360 azimuth bins and the results are encoded into five channels: normalized range, structural hit type (wall, window, or opening), range gradient, inverse range, and local range variance. From the image side, the same descriptor structure is populated by detecting structural elements in the fisheye imagery. As a first step toward full cross-modal matching, we present a window detection algorithm for fisheye images that uses a line segment detector to identify window frames via vertical edge clustering and brightness verification. Detected windows are projected to azimuthal bearings through the fisheye camera model, producing the hit-type channel of the visual descriptor. As a proof of concept, we generate both descriptors at a single known pose from the Hilti-Trimble SLAM Challenge 2026 dataset and demonstrate that the wall-window pattern extracted from the first frame of each camera closely matches the floor plan descriptor, validating the feasibility of cross-modal structural matching.

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

Dev.to

IK_LLAMA now supports Qwen3.5 MTP Support :O

Reddit r/LocalLLaMA

OpenAI models, Codex, and Managed Agents come to AWS

Dev.to

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

Automatic Error Recovery in AI Agent Networks

Dev.to

COMPASS: COmpact Multi-channel Prior-map And Scene Signature for Floor-Plan-Based Visual Localization

Key Points

Abstract

Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

IK_LLAMA now supports Qwen3.5 MTP Support :O

OpenAI models, Codex, and Managed Agents come to AWS

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Automatic Error Recovery in AI Agent Networks

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer