(Interactive)OpenCode Racing Game Comparison Qwen3.6 35B vs Qwen3.5 122B vs Qwen3.5 27B vs Qwen3.5 4B vs Gemma 4 31B vs Gemma 4 26B vs Qwen3 Coder Next vs GLM 4.7 Flash

Reddit r/LocalLLaMA / 4/21/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The interactive page lets users play a set of LLM-driven “racing game” implementations produced by different models, comparing how they generate and modify the game code.
  • The creator explains the methodology: vision is disabled, the same initial prompt is used in Plan mode, Playwright MCP is enabled to run/play the game, and multiple turns of prompting are used to surface issues to the models.
  • Notable behavior differences include Qwen3 Coder Next seemingly using invisible-wall tracks, Gemma 4 31B and Qwen3.5 27B outputting full code each turn, and Qwen3.5 27B accidentally succeeding on the last turn due to disabling Playwright MCP.
  • Other observations highlight unique features per model, such as Gemma 4 26B adding sound and spawning a subagent, and GLM 4.7 Flash using a subagent during planning.
  • The write-up also mentions limitations and “what I would do differently,” including not disabling vision and preserving/showing all HTML versions for better reproducibility and comparison.
(Interactive)OpenCode Racing Game Comparison Qwen3.6 35B vs Qwen3.5 122B vs Qwen3.5 27B vs Qwen3.5 4B vs Gemma 4 31B vs Gemma 4 26B vs Qwen3 Coder Next vs GLM 4.7 Flash

You can play them here: https://fatheredpuma81.github.io/LLM_Racing_Games/

This started out as a simple test for Qwen3 Coder Next vs Qwen3.5 4B because they have similar benchmark numbers and then I just kept trying other models and decided I might as well share it even if I'm not that happy with how I did it.

Read the "How this works" in the top right if you want to know how it was but the TLDR is: Disabled vision, sent same initial prompt in Plan mode, enabled Playwright MCP and sent the same start prompt, and then spent 3 turns testing the games and pointing out what issues I saw to the LLMs.

There's a ton of things I'd do differently if I ever got around to redoing this. Keeping and showing all 4 versions of the HTML for 1, not disabling Vision which hindered Qwen 27B a ton (it was only disabled for an apples to apples comparison between 4B and Coder), and idk I had a bunch more thoughts on it but I'm too tired to remember them.

Some interesting notes:

  • Qwen3 Coder Next's game does appear to have a track but it's made up of invisible walls.
  • Gemma 4 31B and Qwen3.5 27B both output the full code on every turn while the rest all primarily edited the code.
  • Gemma 4 31B's game actually had a road at one point.
  • Qwen3.5 27B Accidentally disabling Playwright MCP on the final turn is what gave us a car that actually moves and steers at a decent speed. The only thing that really changed between the 1st HTML and last was it added trees.
  • Qwen3.5 27B is the only one with tires that turn. Not that you can see it.
  • Gemma 4 26B was the only one to add sound.
  • Gemma 4 26B added a Team Rocket car blasting off again when you touched a wall but then OpenCode more or less crashed in the middle of it so I had to roll back which resulted in the less interesting Sound version.
  • GLM 4.7 Flash and Gemma 4 26B were the only ones to spawn a subagent. GLM used it for research during Planning and Gemma used it to implement sound on the final turn.
  • Found out GLM 4.7 Flash can't do Q8_0 K Cache Quantization without breaking.
  • Qwen3.5 4B installed its own version of Playwright using NPX and then it started using both on bugfix turn 2/3.
  • GLM 4.7 Flash failed its final output to a white screen so I jumped back a turn and asked it to output the code full again. So it only got 2 turns I guess?
  • Qwen3.6 35B's game actually regressed in a lot of ways from the start. There was no screen jitter, the track was a lot more narrow, and the hit boxes were spot on with the walls. The minimap was a lot more broken though I think it got confused between Minimap Track and physical track.
submitted by /u/FatheredPuma81
[link] [comments]