Lorentz Framework for Semantic Segmentation

arXiv cs.CV / 4/21/2026

📰 NewsModels & Research

Key Points

  • The paper introduces a new, architecture-agnostic semantic segmentation framework for hyperbolic space using the Lorentz model, addressing limitations of the commonly used Poincaré ball approach (numerical instability, optimization, and computation issues).
  • It proposes pixel-wise (and mask) classification with hierarchical representations in Lorentz space guided by text embeddings that include semantic and visual cues.
  • The method enables stable and efficient optimization without relying on a Riemannian optimizer, and it can be integrated with existing Euclidean segmentation architectures.
  • Beyond segmentation accuracy, the approach provides free uncertainty estimation (confidence maps, boundary delineation) and supports hierarchical/text-based retrieval and zero-shot performance, with experiments indicating convergence toward more generalized flatter minima.
  • Extensive evaluations on major datasets (ADE20K, COCO-Stuff-164k, Pascal-VOC, Cityscapes) using strong pixel- and mask-based baselines (DeepLabV3, SegFormer, mask2former, maskformer) validate the approach’s effectiveness and generality, and the authors release code.

Abstract

Semantic segmentation in hyperbolic space enables compact modeling of hierarchical structure while providing inherent uncertainty quantification. Prior approaches predominantly rely on the Poincar\'e ball model, which suffers from numerical instability, optimization, and computational challenges. We propose a novel, tractable, architecture-agnostic semantic segmentation framework (pixel-wise and mask classification) in the hyperbolic Lorentz model. We employ text embeddings with semantic and visual cues to guide hierarchical pixel-level representations in Lorentz space. This enables stable and efficient optimization without requiring a Riemannian optimizer, and easily integrates with existing Euclidean architectures. Beyond segmentation, our approach yields free uncertainty estimation, confidence map, boundary delineation, hierarchical and text-based retrieval, and zero-shot performance, reaching generalized flatter minima. We introduce a novel uncertainty and confidence indicator in Lorentz cone embeddings. Further, we provide analytical and empirical insights into Lorentz optimization via gradient analysis. Extensive experiments on ADE20K, COCO-Stuff-164k, Pascal-VOC, and Cityscapes, utilizing state-of-the-art per-pixel classification models (DeepLabV3 and SegFormer) and mask classification models (mask2former and maskformer), validate the effectiveness and generality of our approach. Our results demonstrate the potential of hyperbolic Lorentz embeddings for robust and uncertainty-aware semantic segmentation. Code is available at https://github.com/mxahan/Lorentz_semantic_segmentation.