Gemini vs Grok: Playing Towers of Annoy

Reddit r/artificial / 4/24/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • Large language models were tasked with writing a Python client to play a two-player adversarial version of Towers of Hanoi, where the villain must immediately move the same disk to an adjacent tower.
  • The contest design makes mistakes costly by giving the hero a move budget of 2^m + 1, only slightly above the solo optimum, so wasted moves typically lead to losses.
  • Models competed in a round-robin tournament with head-to-head matchups that used multiple rounds (including sudden death) and ran two simultaneous games per round with swapped hero/villain roles.
  • The challenge scaled in difficulty from 4 towers/3 disks up to 12 towers/7 disks, testing the models’ ability to handle increasingly complex adversarial planning.
  • A detailed write-up reports that Gemini performed strongly (“aced” the challenge), including results across the tournament setup.
Gemini vs Grok: Playing Towers of Annoy

LLMs were asked to write a Python 3.10 client that plays a two-player adversarial variant of the Towers of Hanoi.

Rules: Hero moves a disk; Villain must immediately move that same disk to an adjacent tower (or pass if no legal move). Hero's budget is 2^m + 1 moves — barely more than the 2^m - 1 solo optimum, so almost any wasted move loses. Round-robin tournament with penalty-shootout matchups: up to 5 rounds (+ sudden death), 2 simultaneous games per round with hero/villain roles swapped. Round configs grow from 4 towers / 3 disks up to 12 towers / 7 disks.

Full writeup

submitted by /u/reditzer
[link] [comments]