MineBenchにおけるKimi K2.5とKimi K2.6の違い

Reddit r/LocalLLaMA / 2026/4/22

💬 オピニオンTools & Practical UsageModels & Research

原文を読む →

共有:

要点

この投稿では、3DのMinecraft風構造を生成できるかを評価するベンチマーク「MineBench」において、Kimi K2.5とKimi K2.6を比較しています。
投稿者は、Kimiの結果が一貫しないことがあると述べており、達成できる上限は高い一方で、ビルドによっては他より品質が劣る場合があるとしています。
投稿者は、全体としてKimi K2.6（も含め）はKimi K2.5から大きく進歩したと結論づけていますが、K2.6はビルドごとに性能のばらつきが出る可能性も示唆しています。
ベンチマーク実行の総コストは$2.35と報告されており、投稿者はこの性能に対してKimiが最も費用対効果が高いと主張しています。
投稿ではMineBenchと関連するGitHubリポジトリへのリンクに加え、過去のモデル比較投稿への参照も掲載されています。

Differences Between Kimi K2.5 and Kimi K2.6 on MineBench

Some Notes:

The one caveat though is that I find Kimi's results to be quite inconsistent; the model clearly has a very high ceiling, but you'll see that some of it's builds (in my opinion) lack in quality compared to the others (though they're all a massive improvement from Kimi K2.5)
Total cost was $2.35
- Think this is by far the most cost effective model for it's performance
- If you enjoy these posts please feel free to help fund the benchmark

Benchmark: https://minebench.ai/
Git Repository: https://github.com/Ammaar-Alam/minebench

Previous Posts:

Extra Information (if you're confused):

Essentially it's a benchmark that tests how well a model can create a 3D Minecraft like structure.

So the models are given a palette of blocks (think of them like legos) and a prompt of what to build, so like the first prompt you see in the post was a fighter jet. Then the models had to build a fighter jet by returning a JSON in which they gave the coordinate of each block/lego (x, y, z). It's interesting to see which model is able to create a better 3D representation of the given prompt.

The smarter models tend to design much more detailed and intricate builds. The repository readme might provide might help give a better understanding.

(Disclaimer: This is a public benchmark I created, so technically self-promotion :)

submitted by /u/ENT_Alam
[link] [comments]

Black Hat USA

AI Business

NAVERが開発！韓国語に特化した大規模言語モデル「HyperCLOVA X」

AI-SCHOLAR

東芝、イジングマシンを100倍高速化する新手法組み合わせ最適化で威力

日経XTECH

ソーシャルメディア投稿向けに特化した無料のAI検出アプリ

Reddit r/artificial

なぜ本番のLLMプロンプトがうまくいかないのか（4ステップで診断する方法）

Dev.to

MineBenchにおけるKimi K2.5とKimi K2.6の違い

要点

関連記事

Black Hat USA

NAVERが開発！韓国語に特化した大規模言語モデル「HyperCLOVA X」

東芝、イジングマシンを100倍高速化する新手法組み合わせ最適化で威力

ソーシャルメディア投稿向けに特化した無料のAI検出アプリ

なぜ本番のLLMプロンプトがうまくいかないのか（4ステップで診断する方法）

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

要点

関連記事

Black Hat USA

NAVERが開発！韓国語に特化した大規模言語モデル「HyperCLOVA X」

東芝、イジングマシンを100倍高速化する新手法 組み合わせ最適化で威力

ソーシャルメディア投稿向けに特化した無料のAI検出アプリ

なぜ本番のLLMプロンプトがうまくいかないのか（4ステップで診断する方法）

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

東芝、イジングマシンを100倍高速化する新手法組み合わせ最適化で威力