AI Driven Soccer Analysis Using Computer Vision

arXiv cs.AI / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes an AI-driven soccer analysis pipeline that uses computer vision to detect and track players across match footage for coaching and performance insights.
  • It evaluates object detection models (including YOLO and Faster R-CNN) on custom video footage to determine which yields the most accurate player identification before downstream segmentation/tracking.
  • To convert camera-perspective measurements into real field coordinates, the approach combines key-point detection (via a CNN) with homography to estimate field geometry and compute real distances.
  • It integrates SAM2 for segmentation and tracking, then transforms segmented player masks into real-world field coordinates to produce tactical outputs such as speed, distance covered, and positioning heatmaps.

Abstract

Sport analysis is crucial for team performance since it provides actionable data that can inform coaching decisions, improve player performance, and enhance team strategies. To analyze more complex features from game footage, a computer vision model can be used to identify and track key entities from the field. We propose the use of an object detection and tracking system to predict player positioning throughout the game. To translate this to positioning in relation to the field dimensions, we use a point prediction model to identify key points on the field and combine these with known field dimensions to extract actual distances. For the player-identification model, object detection models like YOLO and Faster R-CNN are evaluated on the accuracy of our custom video footage using multiple different evaluation metrics. The goal is to identify the best model for object identification to obtain the most accurate results when paired with SAM2 (Segment Anything Model 2) for segmentation and tracking. For the key point detection model, we use a CNN model to find consistent locations in the soccer field. Through homography, the positions of points and objects in the camera perspective will be transformed to a real-ground perspective. The segmented player masks from SAM2 are transformed from camera perspective to real-world field coordinates through homography, regardless of camera angle or movement. The transformed real-world coordinates can be used to calculate valuable tactical insights including player speed, distance covered, positioning heatmaps, and more complex team statistics, providing coaches and players with actionable performance data previously unavailable from standard video analysis.