DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection
arXiv cs.CV / 4/6/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- DeCo-DETR is introduced as a new vision-centric framework for open-vocabulary object detection that targets practical deployment limits of existing OVOD methods.
- The method avoids costly inference-time text encoding by building a hierarchical semantic prototype space offline using region-level descriptions from pre-trained LVLMs, aligned via CLIP for reusable semantics.
- It also improves training dynamics by decoupling semantic reasoning from localization, running alignment and detection as parallel optimization streams to reduce the typical accuracy–generalization trade-off.
- Experiments on standard OVOD benchmarks show competitive zero-shot performance alongside significantly improved inference efficiency, suggesting better scalability for real systems.




