Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out

VentureBeat / 6/13/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • Moonshot AI released the open-source Kimi K2.7-Code, an OpenAI-compatible update to its K2 coding model family that targets lower “overthinking” by cutting thinking-token usage by 30% versus K2.6.
  • The model keeps the same trillion-parameter mixture-of-experts architecture as K2.6 and can be deployed via vLLM or SGLang, but it runs only in “thinking mode” and does not support temperature tuning (fixed at 1.0).
  • Moonshot claims sizable benchmark improvements (e.g., on Kimi Code Bench v2, Program Bench, and MLS Bench Lite), but these are proprietary benchmarks controlled by Moonshot, raising questions about external validity.
  • Independent benchmarking by researcher Elliot Arledge on KernelBench-Hard suggests K2.7 may be “more honest but not more capable,” and it has not been submitted to DeepSWE, an independent coding benchmark often used to compare model competency.
  • Practical implications are that teams routing models via API gateways may see inference-cost savings if the 30% reduction holds, but may also need to re-validate performance on third-party, model-routing-relevant benchmarks.

Continue reading this article on the original site.

Read original →