ViTs for Action Classification in Videos: An Approach to Risky Tackle Detection in American Football Practice Videos

arXiv cs.CV / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

研究は、アメフト練習動画から危険なタックル動作を早期に検出し、介入や選手の安全性向上につなげることを目的としている。
初回接触を中心に時間的にローカライズし、SATT-3のstrike zone成分でラベル付けした733本の「単一選手×ダミー」タックルクリップからなる大幅拡張データセットを新たに提示している。
Vision Transformer（ViT）ベースの動画解析に、クラス不均衡を考慮した学習手法を組み合わせた結果、交差検証で risky recall 0.67、Risky F1 0.59 を達成した。
既存の小規模ベースライン（risky recall 0.58、Risky F1 0.56）に比べて、より大規模なデータセット上でrisky recallを8%以上改善できたと報告している。
希少だが安全に直結するタッキングパターンを、ViTと不均衡対策により実用的に検出できる可能性を示している。

Abstract

Early identification of hazardous actions in contact sports enables timely intervention and improves player safety. We present a method for detecting risky tackles in American football practice videos and introduce a substantially expanded dataset for this task. Our work contains 733 single-athlete-dummy tackle clips, each temporally localized around first point contact and labeled with a strike zone component of the standardized Assessment for Tackling Technique (SATT-3), extending prior work that reported 178 annotated videos. Using a Vision transformer-based model with imbalance-aware training, we obtain risky recall of 0.67 and Risky F1 of 0.59 under crossvalidation. Relative to the previous baseline in a smaller subset (risky recall of 0.58; Risky F1 0.56 ), our approach improves risky recall by more than 8% points on a much larger dataset. These results indicate that the vision transformer-based video analysis, coupled with careful handling of class imbalance, can reliably detect rare but safety-critical tackling patterns, offering a practical pathway toward coach-centered injury prevention tools.

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

MarkTechPost

The house asked me a question

Dev.to

Precision Clip Selection: How AI Suggests Your In and Out Points

Dev.to

ViTs for Action Classification in Videos: An Approach to Risky Tackle Detection in American Football Practice Videos

Key Points

Abstract

Related Articles

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

The house asked me a question

Precision Clip Selection: How AI Suggests Your In and Out Points

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer