Kimi K2.6 - the mighty turtle that wins the race

Reddit r/LocalLLaMA / 4/25/2026

💬 OpinionTools & Practical UsageModels & Research

共有:

Key Points

A tester reports benching the model “Kimi K2.6” using a custom benchmark where models compete in autonomous games of the social deduction game Blood on the Clocktower.
Early results from 64 games show K2.6 dominating the leaderboard with consistent wins, despite being slower than competing models.
The article notes K2.6 is computationally heavy, averaging about 570,000 tokens per game and taking roughly 10–15 hours per match (vs. ~1–3 hours for a reference model), making it relatively expensive per game.
Reliability is described as fairly good, with a 0.9% tool-call error rate, and the post highlights specific strong plays and rule-related mistakes.
The post links to game transcripts and explains the evaluation setup, enabling others to review how K2.6 performs in these long-form autonomous interactions.

Kimi K2.6 - the mighty turtle that wins the race

Hi folks, I've been benching Kimi K2.6 for the past few days, and I'd like to share my findings.

For context, this is based on a benchmark I've created that pits models against each other in autonomous games of Blood on the Clocktower - a highly complex social deduction game.

Findings:

K2.6 has played 64 games so far (2 games per match), these are early results but it has absolutely dominated the leaderboard through consistent wins against other models.

K2.6 is slow, generating an average of 570,000 tokens per game. Gemini 3.1 Pro, for contrast, generates 180,000 tokens per game. An average match takes about 1-3 hours, with K2.6 it takes about 10-15 hours (using Moonshot AI as a provider).

K2.6 is expensive - mainly due to the high token output, costing $2.31/game. This is still significantly less than Claude Opus 4.6, which costs $3.79/game. GLM 5.1, however, costs a more modest $0.88/game.

Reliability is decent with a 0.9% tool call error rate.

Notable moves:

Rejecting manipulation from Claude Opus 4.6 (shown in image): https://clocktower-radio.com/games/IyLrh8Q#event-79
Minion self-sacrifice to get Demon to last 2: https://clocktower-radio.com/games/Do9NaoQ#event-290

Notable mistakes:

Fumbling with the rules - Empaths do wake on the starting night: https://clocktower-radio.com/games/6C4GDCU#event-38
Accidentally whispering their evil plot to the good side (although recovered, gaslit, and won that game): https://clocktower-radio.com/games/XRpvext#event-34

Kimi K2.6 transcripts: https://clocktower-radio.com/search?a=Kimi+K2.6

How-it-works: https://clocktower-radio.com/how-it-works

submitted by /u/cjami
[link] [comments]