PanORama: Multiview Consistent Panoptic Segmentation in Operating Rooms

arXiv cs.CV / 3/23/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • PanORama proposes multiview-consistent panoptic segmentation for operating rooms by modeling cross-view interactions inside the backbone in a single forward pass to enforce view coherence.
  • The method is calibration-free, requiring no camera parameters, and generalizes to unseen camera viewpoints at inference time.
  • It achieves over 70% Panoptic Quality on MM-OR and 4D-OR datasets and surpasses prior state-of-the-art.
  • The work aims to enhance surgical spatial understanding and perception, enabling improved assistance, with code to be released upon acceptance.

Abstract

Operating rooms (ORs) are cluttered, dynamic, highly occluded environments, where reliable spatial understanding is essential for situational awareness during complex surgical workflows. Achieving spatial understanding for panoptic segmentation from sparse multiview images poses a fundamental challenge, as limited visibility in a subset of views often leads to mispredictions across cameras. To this end, we introduce PanORama, the first panoptic segmentation for the operating room that is multiview-consistent by design. By modeling cross-view interactions at the feature level inside the backbone in a single forward pass, view consistency emerges directly rather than through post-hoc refinement. We evaluate on the MM-OR and 4D-OR datasets, achieving >70% Panoptic Quality (PQ) performance, and outperforming the previous state of the art. Importantly, PanORama is calibration-free, requiring no camera parameters, and generalizes to unseen camera viewpoints within any multiview configuration at inference time. By substantially enhancing multiview segmentation and, consequently, spatial understanding in the OR, we believe our approach opens new opportunities for surgical perception and assistance. Code will be released upon acceptance.