MineBenchにおけるKimi K2.5とKimi K2.6の違い

Reddit r/LocalLLaMA / 2026/4/22

💬 オピニオンTools & Practical UsageModels & Research

要点

  • この投稿では、3DのMinecraft風構造を生成できるかを評価するベンチマーク「MineBench」において、Kimi K2.5とKimi K2.6を比較しています。
  • 投稿者は、Kimiの結果が一貫しないことがあると述べており、達成できる上限は高い一方で、ビルドによっては他より品質が劣る場合があるとしています。
  • 投稿者は、全体としてKimi K2.6(も含め)はKimi K2.5から大きく進歩したと結論づけていますが、K2.6はビルドごとに性能のばらつきが出る可能性も示唆しています。
  • ベンチマーク実行の総コストは$2.35と報告されており、投稿者はこの性能に対してKimiが最も費用対効果が高いと主張しています。
  • 投稿ではMineBenchと関連するGitHubリポジトリへのリンクに加え、過去のモデル比較投稿への参照も掲載されています。
Differences Between Kimi K2.5 and Kimi K2.6 on MineBench

Some Notes:

  • The one caveat though is that I find Kimi's results to be quite inconsistent; the model clearly has a very high ceiling, but you'll see that some of it's builds (in my opinion) lack in quality compared to the others (though they're all a massive improvement from Kimi K2.5)
  • Total cost was $2.35
    • Think this is by far the most cost effective model for it's performance
    • If you enjoy these posts please feel free to help fund the benchmark

Benchmark: https://minebench.ai/
Git Repository: https://github.com/Ammaar-Alam/minebench

Previous Posts:

Previous Posts:

Extra Information (if you're confused):

Essentially it's a benchmark that tests how well a model can create a 3D Minecraft like structure.

So the models are given a palette of blocks (think of them like legos) and a prompt of what to build, so like the first prompt you see in the post was a fighter jet. Then the models had to build a fighter jet by returning a JSON in which they gave the coordinate of each block/lego (x, y, z). It's interesting to see which model is able to create a better 3D representation of the given prompt.

The smarter models tend to design much more detailed and intricate builds. The repository readme might provide might help give a better understanding.

(Disclaimer: This is a public benchmark I created, so technically self-promotion :)

submitted by /u/ENT_Alam
[link] [comments]