Scouting By Reward: VLM-TO-IRL-Driven Player Selection For Esports

arXiv cs.LG / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes reframing esports player scouting as an Inverse Reinforcement Learning (IRL) problem to better capture nuanced tactical decision patterns beyond aggregate performance metrics.
  • It introduces a player-selection framework that learns professional-specific reward functions from logged gameplay demonstrations, ranking prospects by stylistic alignment with a target star player.
  • The architecture uses multimodal, two-branch inputs combining structured state-action trajectories from in-game telemetry with temporally aligned tactical pseudo-commentary generated from broadcast footage by Vision-Language Models (VLMs).
  • A Generative Adversarial Imitation Learning (GAIL) setup trains a discriminator to learn elite professionals’ distinctive mechanical and tactical signatures for candidate evaluation.
  • The approach aims to enable scalable, workflow-aware “digital twin” roster construction for targeted talent discovery across very large candidate pools.

Abstract

Traditional esports scouting workflows rely heavily on manual video review and aggregate performance metrics, which often fail to capture the nuanced decision-making patterns necessary to determine if a prospect fits a specific tactical archetype. To address this, we reframe style-based player evaluation in esports as an Inverse Reinforcement Learning (IRL) problem. In this paper, we introduce a novel player selection framework that learns professional-specific reward functions from logged gameplay demonstrations, allowing organizations to rank candidates by their stylistic alignment with a target star player. Our proposed architecture utilizes a multimodal, two-branch intake: one branch encodes structured state-action trajectories derived from high-resolution in-game telemetry, while the second encodes temporally aligned tactical pseudo-commentary generated by Vision-Language Models (VLMs) from broadcast footage. These representations are fused and evaluated via a Generative Adversarial Imitation Learning (GAIL) objective, where a discriminator learns to capture the unique mechanical and tactical signatures of elite professionals. By transitioning from generic skill estimation to scouting "by reward," this framework provides a scalable, workflow-aware digital twin system that enables data-driven roster construction and targeted talent discovery across massive candidate pools.