OV-Stitcher: A Global Context-Aware Framework for Training-Free Open-Vocabulary Semantic Segmentation
arXiv cs.CV / 4/10/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- OV-Stitcher is introduced as a training-free framework for open-vocabulary semantic segmentation that improves upon existing methods that rely on sliding-window crops due to encoder input-resolution limits.
- Instead of processing sub-images independently, OV-Stitcher “stitches” fragmented sub-image features inside the final encoder block to reconstruct attention representations for global, full-image context.
- This design yields more coherent context aggregation and spatially consistent, semantically aligned segmentation outputs compared with prior training-free baselines.
- Experiments across eight benchmarks show an mIoU improvement from 48.7 to 50.7 relative to existing training-free approaches, indicating better scalable performance for open-vocabulary segmentation.



