Qwen3.5-27B performs almost on par with 397B and GPT-5 mini in the Game Agent Coding League

Reddit r/LocalLLaMA / 3/15/2026

📰 NewsSignals & Early TrendsModels & Research

Read original →

共有:

Key Points

The March run of the Game Agent Coding League shows GPT-5.4 leading, with Qwen3.5-27B only 0.04 points behind 397B, indicating strong competitiveness.
Qwen3.5-27B outperforms other Qwen models and trails only the 397B model, highlighting its outstanding performance in the league’s agent-code benchmark.
In GACL, models generate two agents that compete in seven games, and only the top-performing agent from each model is used for the leaderboard, with all game logs, scoreboards, and generated codes publicly available.
The benchmarks reveal a trend of smaller/open-weight models approaching larger models in capability, suggesting ongoing efficiency improvements and benchmarking relevance.

Qwen3.5-27B performs almost on par with 397B and GPT-5 mini in the Game Agent Coding League

Hi LocalLlama.

Here are the results from the March run of the GACL. A few observations from my side:

GPT-5.4 clearly leads among the major models at the moment.
Qwen3.5-27B performed better than every other Qwen model except 397B, trailing it by only 0.04 points. In my opinion, it’s an outstanding model.
Kimi2.5 is currently the top open-weight model, ranking #6 globally, while GLM-5 comes next at #7 globally.
Significant difference between Opus and Sonnet, more than I expected.
GPT models dominate the Battleship game. However, Tic-Tac-Toe didn’t work well as a benchmark since nearly all models performed similarly. I’m planning to replace it with another game next month. Suggestions are welcome.

For context, GACL is a league where models generate agent code to play seven different games. Each model produces two agents, and each agent competes against every other agent except its paired “friendly” agent from the same model. In other words, the models themselves don’t play the games but they generate the agents that do. Only the top-performing agent from each model is considered when creating the leaderboards.

All game logs, scoreboards, and generated agent codes are available on the league page.

Github Link

League Link

submitted by /u/kyazoglu
[link] [comments]