Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out

VentureBeat / 6/13/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

Moonshot AI released the open-source Kimi K2.7-Code, an OpenAI-compatible update to its K2 coding model family that targets lower “overthinking” by cutting thinking-token usage by 30% versus K2.6.
The model keeps the same trillion-parameter mixture-of-experts architecture as K2.6 and can be deployed via vLLM or SGLang, but it runs only in “thinking mode” and does not support temperature tuning (fixed at 1.0).
Moonshot claims sizable benchmark improvements (e.g., on Kimi Code Bench v2, Program Bench, and MLS Bench Lite), but these are proprietary benchmarks controlled by Moonshot, raising questions about external validity.
Independent benchmarking by researcher Elliot Arledge on KernelBench-Hard suggests K2.7 may be “more honest but not more capable,” and it has not been submitted to DeepSWE, an independent coding benchmark often used to compare model competency.
Practical implications are that teams routing models via API gateways may see inference-cost savings if the 30% reduction holds, but may also need to re-validate performance on third-party, model-routing-relevant benchmarks.

Continue reading this article on the original site.

Dev.to

Dev.to

Dev.to

Dev.to

Dev.to