Scouting By Reward: VLM-TO-IRL-Driven Player Selection For Esports

arXiv cs.LG / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper proposes reframing esports player scouting as an Inverse Reinforcement Learning (IRL) problem to better capture nuanced tactical decision patterns beyond aggregate performance metrics.
It introduces a player-selection framework that learns professional-specific reward functions from logged gameplay demonstrations, ranking prospects by stylistic alignment with a target star player.
The architecture uses multimodal, two-branch inputs combining structured state-action trajectories from in-game telemetry with temporally aligned tactical pseudo-commentary generated from broadcast footage by Vision-Language Models (VLMs).
A Generative Adversarial Imitation Learning (GAIL) setup trains a discriminator to learn elite professionals’ distinctive mechanical and tactical signatures for candidate evaluation.
The approach aims to enable scalable, workflow-aware “digital twin” roster construction for targeted talent discovery across very large candidate pools.

Abstract

Traditional esports scouting workflows rely heavily on manual video review and aggregate performance metrics, which often fail to capture the nuanced decision-making patterns necessary to determine if a prospect fits a specific tactical archetype. To address this, we reframe style-based player evaluation in esports as an Inverse Reinforcement Learning (IRL) problem. In this paper, we introduce a novel player selection framework that learns professional-specific reward functions from logged gameplay demonstrations, allowing organizations to rank candidates by their stylistic alignment with a target star player. Our proposed architecture utilizes a multimodal, two-branch intake: one branch encodes structured state-action trajectories derived from high-resolution in-game telemetry, while the second encodes temporally aligned tactical pseudo-commentary generated by Vision-Language Models (VLMs) from broadcast footage. These representations are fused and evaluated via a Generative Adversarial Imitation Learning (GAIL) objective, where a discriminator learns to capture the unique mechanical and tactical signatures of elite professionals. By transitioning from generic skill estimation to scouting "by reward," this framework provides a scalable, workflow-aware digital twin system that enables data-driven roster construction and targeted talent discovery across massive candidate pools.

FastAPI With LangChain and MongoDB

Dev.to

[Patterns] AI Agent Error Handling That Actually Works

Dev.to

Building ONNX Embedding Workflows in Oracle AI Database with Python

Dev.to

🌱 Green Habit Tracker

Dev.to

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

Dev.to

Scouting By Reward: VLM-TO-IRL-Driven Player Selection For Esports

Key Points

Abstract

Related Articles

FastAPI With LangChain and MongoDB

[Patterns] AI Agent Error Handling That Actually Works

Building ONNX Embedding Workflows in Oracle AI Database with Python

🌱 Green Habit Tracker

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer