Direct Segmentation without Logits Optimization for Training-Free Open-Vocabulary Semantic Segmentation
arXiv cs.CV / 4/10/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper targets open-vocabulary semantic segmentation by removing the usual pixel-level vision-language alignment step that relies on cosine-similarity “logits” and iterative optimization.
- It proposes a training-free method that derives an analytic solution for the semantic segmentation map instead of optimizing logits with time-consuming training or model-specific attention modulation.
- The core hypothesis is that the distribution discrepancy between visual and linguistic features encodes semantics, showing intra-category consistency across image patches and inter-category inconsistency.
- By directly using the analytic solution of this distribution discrepancy, the approach avoids iterative training and still achieves state-of-the-art results across eight benchmark datasets.



