Lightweight Distillation of SAM 3 and DINOv3 for Edge-Deployable Individual-Level Livestock Monitoring and Longitudinal Visual Analytics
arXiv cs.CV / 5/1/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research
Key Points
- The study proposes compressing large foundation-model components for precision livestock farming so they can run on commodity edge accelerators with limited GPU memory.
- It distills SAM 3’s 446M-parameter perception encoder into a 40.66M-parameter multi-scale student using a TinyViT-based feature pyramid encoder, a four-term direction-then-scale distillation loss, and sliding-window inference with session pruning to cap streaming memory growth.
- It also leverages a DINOv3-family pre-distilled ViT-S/16 variant (about 21.6M parameters) as the per-animal embedder, positioned alongside a much larger ViT-7B teacher for distillation support.
- Experiments on the Edinburgh Pig dataset show strong agreement with the SAM 3 teacher while substantially reducing system size and peak VRAM, and the approach also supports multi-class pig behavior classification with high accuracy.
- The resulting pipeline is demonstrated to fit within an NVIDIA Jetson Orin NX 16GB setup and outlines an (unvalidated) on-device embedding-pool re-identification mechanism to build longitudinal visual records for downstream outcome association (e.g., disease and lameness).
Related Articles

Black Hat USA
AI Business

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Announcing the NVIDIA Nemotron 3 Super Build Contest
Dev.to

75% of Sites Blocking AI Bots Still Get Cited. Here Is Why Blocking Does Not Work.
Dev.to