(Interactive)OpenCode Racing Game Comparison Qwen3.6 35B vs Qwen3.5 122B vs Qwen3.5 27B vs Qwen3.5 4B vs Gemma 4 31B vs Gemma 4 26B vs Qwen3 Coder Next vs GLM 4.7 Flash

Reddit r/LocalLLaMA / 4/21/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The interactive page lets users play a set of LLM-driven “racing game” implementations produced by different models, comparing how they generate and modify the game code.
The creator explains the methodology: vision is disabled, the same initial prompt is used in Plan mode, Playwright MCP is enabled to run/play the game, and multiple turns of prompting are used to surface issues to the models.
Notable behavior differences include Qwen3 Coder Next seemingly using invisible-wall tracks, Gemma 4 31B and Qwen3.5 27B outputting full code each turn, and Qwen3.5 27B accidentally succeeding on the last turn due to disabling Playwright MCP.
Other observations highlight unique features per model, such as Gemma 4 26B adding sound and spawning a subagent, and GLM 4.7 Flash using a subagent during planning.
The write-up also mentions limitations and “what I would do differently,” including not disabling vision and preserving/showing all HTML versions for better reproducibility and comparison.

(Interactive)OpenCode Racing Game Comparison Qwen3.6 35B vs Qwen3.5 122B vs Qwen3.5 27B vs Qwen3.5 4B vs Gemma 4 31B vs Gemma 4 26B vs Qwen3 Coder Next vs GLM 4.7 Flash

You can play them here: https://fatheredpuma81.github.io/LLM_Racing_Games/

This started out as a simple test for Qwen3 Coder Next vs Qwen3.5 4B because they have similar benchmark numbers and then I just kept trying other models and decided I might as well share it even if I'm not that happy with how I did it.

Read the "How this works" in the top right if you want to know how it was but the TLDR is: Disabled vision, sent same initial prompt in Plan mode, enabled Playwright MCP and sent the same start prompt, and then spent 3 turns testing the games and pointing out what issues I saw to the LLMs.

There's a ton of things I'd do differently if I ever got around to redoing this. Keeping and showing all 4 versions of the HTML for 1, not disabling Vision which hindered Qwen 27B a ton (it was only disabled for an apples to apples comparison between 4B and Coder), and idk I had a bunch more thoughts on it but I'm too tired to remember them.

Some interesting notes:

Qwen3 Coder Next's game does appear to have a track but it's made up of invisible walls.
Gemma 4 31B and Qwen3.5 27B both output the full code on every turn while the rest all primarily edited the code.
Gemma 4 31B's game actually had a road at one point.
Qwen3.5 27B Accidentally disabling Playwright MCP on the final turn is what gave us a car that actually moves and steers at a decent speed. The only thing that really changed between the 1st HTML and last was it added trees.
Qwen3.5 27B is the only one with tires that turn. Not that you can see it.
Gemma 4 26B was the only one to add sound.
Gemma 4 26B added a Team Rocket car blasting off again when you touched a wall but then OpenCode more or less crashed in the middle of it so I had to roll back which resulted in the less interesting Sound version.
GLM 4.7 Flash and Gemma 4 26B were the only ones to spawn a subagent. GLM used it for research during Planning and Gemma used it to implement sound on the final turn.
Found out GLM 4.7 Flash can't do Q8_0 K Cache Quantization without breaking.
Qwen3.5 4B installed its own version of Playwright using NPX and then it started using both on bugfix turn 2/3.
GLM 4.7 Flash failed its final output to a white screen so I jumped back a turn and asked it to output the code full again. So it only got 2 turns I guess?
Qwen3.6 35B's game actually regressed in a lot of ways from the start. There was no screen jitter, the track was a lot more narrow, and the hit boxes were spot on with the walls. The minimap was a lot more broken though I think it got confused between Minimap Track and physical track.