From Skeletons to Pixels: Few-Shot Precise Event Spotting via Representation and Prediction Distillation
arXiv cs.CV / 4/28/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses Precise Event Spotting (PES) in fast sports like tennis, where accurate frame-level event localization is difficult due to motion blur, fine-grained action differences, and scarce annotations.
- It proposes two few-shot distillation approaches: Adaptive Weight Distillation (AWD), which adaptively reweights teacher predictions on unlabeled data, and AMD-FED, which distills robust “skeleton” knowledge into visual representations via annealed pseudo-labeling.
- Both methods rely on multimodal distillation to improve generalization when labeled data is limited, focusing on transferring useful information across modalities.
- Experiments on F3Set-Tennis(sub) under k-clip few-shot settings show consistent gains over single-modality baselines and previous PES methods, and AMD-FED also performs robustly on Figure Skating.
- The results indicate that representation-level multimodal distillation—especially skeleton-to-visual transfer—can be particularly effective for few-shot precise event spotting.
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Same Agent, Different Risk | How Microsoft 365 Copilot Grounding Changes the Security Model | Rahsi Framework™
Dev.to

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System
Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)
Dev.to

🦀 PicoClaw Deep Dive — A Field Guide to Building an Ultra-Light AI Agent in Go 🐹
Dev.to