| Following an impressive shake-up by Kimi K2.6, I've now got some results for Xiaomi's MiMo-V2.5-Pro. For context, this is based on a benchmark I've created that pits models against each other in autonomous games of Blood on the Clocktower - a highly complex social deduction game. If you're unfamiliar, it's like Mafia/Werewolf or The Traitors TV show. MiMo-V2.5-Pro joins Kimi K2.6 as another dominant player, both models pulling away from the crowd in their own class. Note I have not yet benched GPT 5.5 (Xhigh) or Claude Opus 4.7 (Max) that may also be in this area. Interestingly, its win rate is a bit lop-sided (Good 88%/ Evil 48%) - having a extremely high good team win rating but a poorer evil team win rating that holds it back from being the top. Why MiMo-V2.5-Pro over Kimi K2.6? Kimi K2.6 has incredibly verbose reasoning at 580,000 average output tokens per game, leading to a $2.65/game cost - this also leads to long response times, matches taking around 10-15 hours to complete. It feels a bit impractical for many use cases. MiMo-V2.5-Pro on the other hand, while slightly verbose at 183,639 tokens per game (similar to Gemini 3.1 Pro verbosity), costs less than half as much at a cooler $0.99/game. On the high end, Claude Opus 4.6 costs $3.76/game. Matches also usually finish around a typical 2-3 hours (if not vs kimi). It is also fairly reliable with a 0.4% tool call error rate. This currently places it as the best value model at the top-end of the group. Notable moves:
Notable mistakes:
MiMo-V2.5-Pro transcripts: https://clocktower-radio.com/search?a=MiMo-V2.5-Pro How-it-works: https://clocktower-radio.com/how-it-works [link] [comments] |
MiMo-V2.5-Pro - the actual best open-weights model
Reddit r/LocalLLaMA / 5/1/2026
💬 OpinionSignals & Early TrendsIndustry & Market MovesModels & Research
Key Points
- MiMo-V2.5-Proが、筆者が作成した「Blood on the Clocktower」の自律ゲーム対戦ベンチマークで、Kimi K2.6に並ぶ“優勢な”オープンウェイト系モデルとして上位に位置づけられています。
- 戦績は勝率が良側(Good)に偏っており(Good 88%/Evil 48%)、同カテゴリ最上位に届ききらない要因になっています。
- Kimi K2.6は推論が非常に冗長で、1ゲームあたりのコストが2.65ドル、試合時間も10〜15時間程度と実用面で重く感じられる一方、MiMo-V2.5-Proはコスト0.99ドルで2〜3時間程度に収まり、コスパ面で優位とされています。
- MiMo-V2.5-Proはツール呼び出しのエラー率が0.4%と比較的信頼性が高く、グループ内の“上位の良い価値”モデルだと結論づけています。
- 具体的な好例(他者視点での思考、クリーンな推理による勝利)と、悪例(悪側のバロンが自爆する想定違い、役職告白のようなミス)が対戦リンクとして示されています。
Related Articles

Black Hat USA
AI Business

Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...
Dev.to

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia
Dev.to

Every Telegram conversation becomes a qualified lead. BizNode captures name, email, and business details automatically while...
Dev.to

MCP, Skills, AI Agents, and New Models: The New Stack for Software Development
Dev.to