CoCo-SAM3: Harnessing Concept Conflict in Open-Vocabulary Semantic Segmentation

arXiv cs.CV / 4/22/2026

📰 NewsModels & Research

Key Points

  • CoCo-SAM3 introduces a prompt-driven approach to open-vocabulary semantic segmentation, targeting instability seen when multiple category prompts are handled independently.
  • The paper identifies two main failure modes in multi-class settings: lack of a unified, comparable evidence scale across classes and intra-class drift caused by synonymous prompts producing inconsistent evidence.
  • CoCo-SAM3 addresses this by decoupling the pipeline into intra-class enhancement (aligning and aggregating evidence from synonymous prompts) and inter-class competition (using a unified comparable scale for pixel-wise comparisons).
  • The method improves multi-class inference stability and reduces inter-class conflicts without requiring any additional training.
  • Reported results show consistent gains across eight open-vocabulary semantic segmentation benchmarks.

Abstract

SAM3 advances open-vocabulary semantic segmentation by introducing a prompt-driven mask generation paradigm. However, in multi-class open-vocabulary scenarios, masks generated independently from different category prompts lack a unified and inter-class comparable evidence scale, often resulting in overlapping coverage and unstable competition. Moreover, synonymous expressions of the same concept tend to activate inconsistent semantic and spatial evidence, leading to intra-class drift that exacerbates inter-class conflicts and compromises overall inference stability. To address these issues, we propose CoCo-SAM3 (Concept-Conflict SAM3), which explicitly decouples inference into intra-class enhancement and inter-class competition. Our method first aligns and aggregates evidence from synonymous prompts to strengthen concept consistency. It then performs inter-class competition on a unified comparable scale, enabling direct pixel-wise comparisons among all candidate classes. This mechanism stabilizes multi-class inference and effectively mitigates inter-class conflicts. Without requiring any additional training, CoCo-SAM3 achieves consistent improvements across eight open-vocabulary semantic segmentation benchmarks.