Best Agent Identification for General Game Playing

arXiv stat.ML / 4/22/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a general procedure to identify the best (or near-best) performing algorithm for each sub-task in multi-problem domains by modeling it as multi-armed bandit best-arm identification.
  • Each bandit represents a task and each arm represents an agent/algorithm, and the method uses an optimistic confidence-interval-based selection strategy to rank arms by their potential impact on simple regret.
  • Experiments on General Video Game AI (GVGAI) and Ludii show substantial improvements over prior best-arm identification methods, reducing average simple regret and increasing the probability of correct identification.
  • The approach is positioned as a way to improve agent evaluation quality and accuracy for general game playing frameworks and other multi-task settings where algorithm runtime is high.

Abstract

We present an efficient and generalised procedure to accurately identify the best (or near best) performing algorithm for each sub-task in a multi-problem domain. Our approach treats this as a set of best arm identification problems for multi-armed bandits, where each bandit corresponds to a specific task and each arm corresponds to a specific algorithm or agent. We propose an optimistic selection process based on a chosen confidence interval, that ranks each arm across all bandits in terms of their potential to influence our overall simple regret. We evaluate the performance of our approach on two of the most popular general game playing domains, the General Video Game AI (GVGAI) framework and the Ludii general game playing system, with the goal of selecting a high-performing agent for each game using a limited number of available trials. Compared to previous best arm identification algorithms for multi-armed bandits, our results demonstrate a substantial performance improvement in terms of average simple regret and average probability of error. This novel approach can be used to significantly improve the quality and accuracy of agent evaluation procedures for general game frameworks, as well as other multi-task domains with high algorithm runtimes.