MiMo-V2.5-Pro - the actual best open-weights model

Reddit r/LocalLLaMA / 5/1/2026

💬 OpinionSignals & Early TrendsIndustry & Market MovesModels & Research

共有:

Key Points

MiMo-V2.5-Proが、筆者が作成した「Blood on the Clocktower」の自律ゲーム対戦ベンチマークで、Kimi K2.6に並ぶ“優勢な”オープンウェイト系モデルとして上位に位置づけられています。
戦績は勝率が良側（Good）に偏っており（Good 88%／Evil 48%）、同カテゴリ最上位に届ききらない要因になっています。
Kimi K2.6は推論が非常に冗長で、1ゲームあたりのコストが2.65ドル、試合時間も10〜15時間程度と実用面で重く感じられる一方、MiMo-V2.5-Proはコスト0.99ドルで2〜3時間程度に収まり、コスパ面で優位とされています。
MiMo-V2.5-Proはツール呼び出しのエラー率が0.4%と比較的信頼性が高く、グループ内の“上位の良い価値”モデルだと結論づけています。
具体的な好例（他者視点での思考、クリーンな推理による勝利）と、悪例（悪側のバロンが自爆する想定違い、役職告白のようなミス）が対戦リンクとして示されています。

MiMo-V2.5-Pro - the actual best open-weights model

Following an impressive shake-up by Kimi K2.6, I've now got some results for Xiaomi's MiMo-V2.5-Pro.

For context, this is based on a benchmark I've created that pits models against each other in autonomous games of Blood on the Clocktower - a highly complex social deduction game. If you're unfamiliar, it's like Mafia/Werewolf or The Traitors TV show.

MiMo-V2.5-Pro joins Kimi K2.6 as another dominant player, both models pulling away from the crowd in their own class. Note I have not yet benched GPT 5.5 (Xhigh) or Claude Opus 4.7 (Max) that may also be in this area.

Interestingly, its win rate is a bit lop-sided (Good 88%/ Evil 48%) - having a extremely high good team win rating but a poorer evil team win rating that holds it back from being the top.

Why MiMo-V2.5-Pro over Kimi K2.6?

Kimi K2.6 has incredibly verbose reasoning at 580,000 average output tokens per game, leading to a $2.65/game cost - this also leads to long response times, matches taking around 10-15 hours to complete. It feels a bit impractical for many use cases.

MiMo-V2.5-Pro on the other hand, while slightly verbose at 183,639 tokens per game (similar to Gemini 3.1 Pro verbosity), costs less than half as much at a cooler $0.99/game. On the high end, Claude Opus 4.6 costs $3.76/game. Matches also usually finish around a typical 2-3 hours (if not vs kimi).

It is also fairly reliable with a 0.4% tool call error rate.

This currently places it as the best value model at the top-end of the group.

Notable moves:

Thinking from the perspective of other players (image 3 - vs GPT 5.5): https://clocktower-radio.com/games/Qxtya8U#event-67
Clean deductions win the game: https://clocktower-radio.com/games/kIoFzhP#event-251

Notable mistakes:

Expected an evil Baron to self-reveal, leading to a loss (image 4 - vs Claude Opus 4.6): https://clocktower-radio.com/games/g4sY9MP#event-126
Minion confessing their role (?): https://clocktower-radio.com/games/Q1kdi8D#event-85

MiMo-V2.5-Pro transcripts: https://clocktower-radio.com/search?a=MiMo-V2.5-Pro

How-it-works: https://clocktower-radio.com/how-it-works

submitted by /u/cjami
[link] [comments]

Black Hat USA

AI Business

Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...

Dev.to

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia

Dev.to

Every Telegram conversation becomes a qualified lead. BizNode captures name, email, and business details automatically while...

Dev.to

MCP, Skills, AI Agents, and New Models: The New Stack for Software Development

Dev.to

MiMo-V2.5-Pro - the actual best open-weights model

Key Points

Related Articles

Black Hat USA

Every handle invocation on BizNode gets a WFID — a universal transaction reference for accountability. Full audit trail,...

Panduan Lengkap TestSprite MCP Server — Dokumentasi Getting Started dalam Bahasa Indonesia

Every Telegram conversation becomes a qualified lead. BizNode captures name, email, and business details automatically while...

MCP, Skills, AI Agents, and New Models: The New Stack for Software Development

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer