From Skeletons to Pixels: Few-Shot Precise Event Spotting via Representation and Prediction Distillation

arXiv cs.CV / 4/28/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses Precise Event Spotting (PES) in fast sports like tennis, where accurate frame-level event localization is difficult due to motion blur, fine-grained action differences, and scarce annotations.
It proposes two few-shot distillation approaches: Adaptive Weight Distillation (AWD), which adaptively reweights teacher predictions on unlabeled data, and AMD-FED, which distills robust “skeleton” knowledge into visual representations via annealed pseudo-labeling.
Both methods rely on multimodal distillation to improve generalization when labeled data is limited, focusing on transferring useful information across modalities.
Experiments on F3Set-Tennis(sub) under k-clip few-shot settings show consistent gains over single-modality baselines and previous PES methods, and AMD-FED also performs robustly on Figure Skating.
The results indicate that representation-level multimodal distillation—especially skeleton-to-visual transfer—can be particularly effective for few-shot precise event spotting.

Abstract

Precise Event Spotting (PES) is essential in fast-paced sports such as tennis, where fine-grained events occur within very short temporal windows. Accurate frame-level localization is challenging because of motion blur, subtle action differences, and limited annotated data. We study two complementary distillation strategies for few-shot PES: Adaptive Weight Distillation (AWD), a prediction-level method that adaptively weights teacher supervision on unlabeled data, and Annealed Multimodal Distillation for Few-Shot Event Detection (AMD-FED), a representation-level framework that transfers robust skeleton knowledge into visual modalities through annealed pseudo-labeling. Both methods use multimodal distillation to improve generalization under limited supervision. We evaluate them on F3Set-Tennis(sub) under few-shot k-clip settings, where they consistently outperform single-modality baselines and prior PES approaches. After observing the stronger performance of representation-level distillation on tennis, we further validate AMD-FED on a second sports dataset, Figure Skating, where it also shows robust performance in the k-clip scenario. These results highlight the effectiveness of multimodal distillation, especially representation-level transfer, for few-shot precise event spotting.

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Same Agent, Different Risk | How Microsoft 365 Copilot Grounding Changes the Security Model | Rahsi Framework™

Dev.to

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

Dev.to

🦀 PicoClaw Deep Dive — A Field Guide to Building an Ultra-Light AI Agent in Go 🐹

Dev.to

From Skeletons to Pixels: Few-Shot Precise Event Spotting via Representation and Prediction Distillation

Key Points

Abstract

Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Same Agent, Different Risk | How Microsoft 365 Copilot Grounding Changes the Security Model | Rahsi Framework™

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)

🦀 PicoClaw Deep Dive — A Field Guide to Building an Ultra-Light AI Agent in Go 🐹

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer