Learning to Play Blackjack: A Curriculum Learning Perspective

arXiv cs.AI / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces an LLM-guided curriculum framework for reinforcement learning that dynamically generates a staged training path over available actions.
  • It applies the approach to Blackjack by progressively introducing actions to both a Tabular Q-Learning agent and a Deep Q-Network (DQN) agent.
  • In an 8-deck simulation across 10 runs, the curriculum method improves the DQN agent’s average win rate from 43.97% to 47.41% and lowers bust rate from 32.9% to 28.0%.
  • The LLM-generated curriculum also substantially accelerates the training workflow, with more than a 74% overall speedup and full training finishing faster than the baseline’s evaluation phase.
  • The results support the claim that LLM-guided curricula can make RL agents more effective, robust, and sample-/time-efficient.

Abstract

Reinforcement Learning (RL) agents often struggle with efficiency and performance in complex environments. We propose a novel framework that uses a Large Language Model (LLM) to dynamically generate a curriculum over available actions, enabling the agent to incorporate each action individually. We apply this framework to the game of Blackjack, where the LLM creates a multi-stage training path that progressively introduces complex actions to a Tabular Q-Learning and a Deep Q-Network (DQN) agent. Our evaluation in a realistic 8-deck simulation over 10 independent runs demonstrates significant performance gains over standard training methods. The curriculum-based approach increases the DQN agent's average win rate from 43.97% to 47.41%, reduces the average bust rate from 32.9% to 28.0%, and accelerates the overall workflow by over 74%, with the agent's full training completing faster than the baseline's evaluation phase alone. These results validate that LLM-guided curricula can build more effective, robust, and efficient RL agents.