Towards Automatic Soccer Commentary Generation with Knowledge-Enhanced Visual Reasoning

arXiv cs.AI / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that end-to-end automatic soccer commentary often fails for real live televised settings due to anonymous entities, context-dependent errors, and lack of statistical insight.
  • It introduces GameSight, a two-stage system that first performs knowledge-enhanced visual reasoning to align mentioned entities (players/teams) using fine-grained visual and contextual analysis.
  • GameSight then refines the entity-aligned commentary by injecting external historical statistics and iteratively updating an internal game state to improve factuality and relevance.
  • Reported results show an 18.5% improvement in player alignment accuracy on the SN-Caption-test-align dataset versus Gemini 2.5-pro, along with gains in segment-level accuracy, commentary quality, and game-level contextual relevance.
  • The work positions this approach as a step toward more informative, human-centric AI sports experiences and provides a demo page for evaluation.

Abstract

Soccer commentary plays a crucial role in enhancing the soccer game viewing experience for audiences. Previous studies in automatic soccer commentary generation typically adopt an end-to-end method to generate anonymous live text commentary. Such generated commentary is insufficient in the context of real-world live televised commentary, as it contains anonymous entities, context-dependent errors and lacks statistical insights of the game events. To bridge the gap, we propose GameSight, a two-stage model to address soccer commentary generation as a knowledge-enhanced visual reasoning task, enabling live-televised-like knowledgeable commentary with accurate reference to entities (players and teams). GameSight starts by performing visual reasoning to align anonymous entities with fine-grained visual and contextual analysis. Subsequently, the entity-aligned commentary is refined with knowledge by incorporating external historical statistics and iteratively updated internal game state information. Consequently, GameSight improves the player alignment accuracy by 18.5% on SN-Caption-test-align dataset compared to Gemini 2.5-pro. Combined with further knowledge enhancement, GameSight outperforms in segment-level accuracy and commentary quality, as well as game-level contextual relevance and structural composition. We believe that our work paves the way for a more informative and engaging human-centric experience with the AI sports application. Demo Page: https://gamesight2025.github.io/gamesight2025