EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation
arXiv cs.CV / 3/20/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The paper introduces EdgeCrafter, a unified compact ViT framework for edge dense prediction to address the performance-gap of small-scale ViTs on resource-constrained devices.
- It centers on ECDet, a detection model built from a distilled compact backbone and an edge-friendly encoder-decoder design to enable efficient object detection, instance segmentation, and pose estimation.
- On COCO, ECDet-S achieves 51.7 AP with fewer than 10M parameters using only COCO annotations, and ECInsSeg reaches performance comparable to RF-DETR with substantially fewer parameters; ECPose-X attains 74.8 AP, outperforming YOLO26Pose-X despite less extensive pretraining.
- The results imply that compact ViTs paired with task-specific distillation and edge-aware design can be a practical and competitive option for edge dense prediction, with code released for community use.