| Hi LocalLlama. Here are the results from the March run of the GACL. A few observations from my side:
For context, GACL is a league where models generate agent code to play seven different games. Each model produces two agents, and each agent competes against every other agent except its paired “friendly” agent from the same model. In other words, the models themselves don’t play the games but they generate the agents that do. Only the top-performing agent from each model is considered when creating the leaderboards. All game logs, scoreboards, and generated agent codes are available on the league page. [link] [comments] |
Qwen3.5-27B performs almost on par with 397B and GPT-5 mini in the Game Agent Coding League
Reddit r/LocalLLaMA / 3/15/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- The March run of the Game Agent Coding League shows GPT-5.4 leading, with Qwen3.5-27B only 0.04 points behind 397B, indicating strong competitiveness.
- Qwen3.5-27B outperforms other Qwen models and trails only the 397B model, highlighting its outstanding performance in the league’s agent-code benchmark.
- In GACL, models generate two agents that compete in seven games, and only the top-performing agent from each model is used for the leaderboard, with all game logs, scoreboards, and generated codes publicly available.
- The benchmarks reveal a trend of smaller/open-weight models approaching larger models in capability, suggesting ongoing efficiency improvements and benchmarking relevance.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.




