CreativeGame:Toward Mechanic-Aware Creative Game Generation

arXiv cs.AI / 4/23/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper argues that current LLM-based game code generation is often brittle and fails to support reliable iterative improvement because it lacks strong, objective optimization signals and explicit treatment of game mechanics.
  • It introduces CreativeGame, a multi-agent HTML5 game generation pipeline that improves iteration by using a proxy reward from programmatic signals, lineage-scoped memory, and runtime validation tied to both repair and reward.
  • The system also adds a mechanic-guided planning loop that turns retrieved mechanic knowledge into an explicit mechanic plan before any code is generated, rather than treating mechanics as post-hoc text.
  • CreativeGame is implemented with substantial infrastructure (Python plus inspection/visualization tools) and includes stored lineages, nodes, and a mechanic archive to enable architectural analysis and lineage-level case studies.
  • Results from a 4-generation lineage suggest that mechanic-level innovation can emerge across later versions and be directly inspected via version-to-version records, enabling interpretable evolution tracking.

Abstract

Large language models can generate plausible game code, but turning this capability into \emph{iterative creative improvement} remains difficult. In practice, single-shot generation often produces brittle runtime behavior, weak accumulation of experience across versions, and creativity scores that are too subjective to serve as reliable optimization signals. A further limitation is that mechanics are frequently treated only as post-hoc descriptions, rather than as explicit objects that can be planned, tracked, preserved, and evaluated during generation. This report presents \textbf{CreativeGame}, a multi-agent system for iterative HTML5 game generation that addresses these issues through four coupled ideas: a proxy reward centered on programmatic signals rather than pure LLM judgment; lineage-scoped memory for cross-version experience accumulation; runtime validation integrated into both repair and reward; and a mechanic-guided planning loop in which retrieved mechanic knowledge is converted into an explicit mechanic plan before code generation begins. The goal is not merely to produce a playable artifact in one step, but to support interpretable version-to-version evolution. The current system contains 71 stored lineages, 88 saved nodes, and a 774-entry global mechanic archive, implemented in 6{,}181 lines of Python together with inspection and visualization tooling. The system is therefore substantial enough to support architectural analysis, reward inspection, and real lineage-level case studies rather than only prompt-level demos. A real 4-generation lineage shows that mechanic-level innovation can emerge in later versions and can be inspected directly through version-to-version records. The central contribution is therefore not only game generation, but a concrete pipeline for observing progressive evolution through explicit mechanic change.