Detect Anything in Real Time: From Single-Prompt Segmentation to Multi-Class Detection
arXiv cs.CV / 3/13/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- DART is a training-free framework that converts SAM3 into a real-time multi-class detector by exploiting the class-agnostic nature of the visual backbone, allowing shared backbone computation across all classes and reducing inference cost from O(N) to O(1).
- By combining batched multi-class decoding, detection-only inference, and TensorRT FP16 deployment, DART delivers a 5.6x cumulative speedup for 3 classes and up to 25x for 80 classes without changing any model weights.
- On COCO val2017 (5,000 images, 80 classes), DART achieves 55.8 AP at 15.8 FPS (4 classes, 1008x1008) on a single RTX 4080, outperforming purpose-built open-vocabulary detectors trained on millions of box annotations.
- For extreme latency targets, adapter distillation with a frozen encoder-decoder can achieve 38.7 AP with a 13.9 ms backbone.
- Code and models for DART are available at the project GitHub repository https://github.com/mkturkcan/DART.
Related Articles
The Security Gap in MCP Tool Servers (And What I Built to Fix It)
Dev.to
I made a new programming language to get better coding with less tokens.
Dev.to
RSA Conference 2026: The Week Vibe Coding Security Became Impossible to Ignore
Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy
Reddit r/artificial
Why I Switched From GPT-4 to Small Language Models for Two of My Products
Dev.to