LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo
arXiv cs.AI / 4/8/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- LudoBench is introduced as a new benchmark for evaluating LLM strategic decision-making in Ludo, a stochastic multi-agent board game with dice-based uncertainty and planning-relevant mechanics.
- The benchmark includes 480 handcrafted spot scenarios across 12 decision categories, and it isolates specific strategic choices to make model behavior easier to interpret and diagnose.
- The accompanying 4-player Ludo simulator supports Random, Heuristic, Game-Theory (depth-limited Expectiminimax), and LLM agents, enabling comparisons against a principled strategic baseline.
- Experiments across six models show low alignment with the game-theory agent (only 40–46%), with models clustering into two incomplete strategy archetypes: “finishers” and “builders.”
- Models also exhibit prompt/history sensitivity, including measurable behavioral shifts under grudge-style framing on identical board states, highlighting a vulnerability in robust reasoning under uncertainty.




