Hyp2Former: Hierarchy-Aware Hyperbolic Embeddings for Open-Set Panoptic Segmentation

arXiv cs.RO / 5/5/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • Hyp2Former is an end-to-end framework for Open-Set Panoptic Segmentation that aims to identify unknown objects as separate instances while segmenting known classes.
  • Unlike prior methods that treat known categories as a flat set, Hyp2Former explicitly leverages the semantic hierarchy by learning hierarchical similarities in hyperbolic embedding space.
  • The model does not require explicit modeling of unknowns during training, yet it preserves structured proximity between unknown objects and higher-level concepts (e.g., unknown animals near “animal/object”).
  • Experiments on multiple datasets (MS COCO, Cityscapes, Lost&Found) show Hyp2Former outperforms existing approaches, improving the trade-off between discovering unknown objects and maintaining robustness on in-distribution classes.

Abstract

Recognizing unknown objects is crucial for safety-critical applications such as autonomous driving and robotics. Open-Set Panoptic Segmentation (OPS) aims to segment known thing and stuff classes while identifying valid unknown objects as separate instances. Prior OPS approaches largely treat known categories as a flat label set, ignoring the semantic hierarchy that provides valuable structural priors for distinguishing unknown objects from in-distribution classes. In this work, we propose Hyp2Former, an end-to-end framework for OPS that does not require explicit modeling of unknowns during training, and instead learns hierarchical semantic similarities continuously in hyperbolic space. By explicitly encoding hierarchical relationships among known categories, the model learns a structured embedding space that captures multiple levels of semantic abstraction. As a result, unknown objects that cannot be confidently classified as known categories still remain in close proximity to higher-level concepts (e.g., an unknown animal remains closer to "animal" or "object" than to unrelated concepts such as "electronics" or "stuff") and can therefore be reliably detected, even if their fine-grained category was not represented during training. Empirical evaluations across multiple public datasets such as MS COCO, Cityscapes, and Lost&Found demonstrate that Hyp2Former outperforms existing methods on OPS, achieving the best balance between unknown object discovery and in-distribution robustness.