Detect Anything in Real Time: From Single-Prompt Segmentation to Multi-Class Detection
arXiv cs.CV / 3/13/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- DART is a training-free framework that converts SAM3 into a real-time multi-class detector by exploiting the class-agnostic nature of the visual backbone, allowing shared backbone computation across all classes and reducing inference cost from O(N) to O(1).
- By combining batched multi-class decoding, detection-only inference, and TensorRT FP16 deployment, DART delivers a 5.6x cumulative speedup for 3 classes and up to 25x for 80 classes without changing any model weights.
- On COCO val2017 (5,000 images, 80 classes), DART achieves 55.8 AP at 15.8 FPS (4 classes, 1008x1008) on a single RTX 4080, outperforming purpose-built open-vocabulary detectors trained on millions of box annotations.
- For extreme latency targets, adapter distillation with a frozen encoder-decoder can achieve 38.7 AP with a 13.9 ms backbone.
- Code and models for DART are available at the project GitHub repository https://github.com/mkturkcan/DART.
Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO
Dev.to

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents
Dev.to

Perplexity Hub
Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide
Dev.to