Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6

Disclosure: Some links in this article are affiliate links. AI Maestro may earn a commission if you make a purchase, at no…

By AI Maestro June 13, 2026 4 min read
Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6

For developers and engineers building complex software, the release of Moonshot AI’s Kimi K2.7-Code signals a shift in how agentic coding is approached. This is not merely a chatbot with a keyboard; it is a specialised engine designed to plan, edit, and debug across long-horizon tasks without losing the thread. For makers, the implication is clear: if you are running multi-step refactors or complex tool-use workflows, the cost-per-task is dropping while the reliability is rising. The model arrives with weights available for self-hosting, removing the dependency on closed black boxes for critical infrastructure work.

The Architecture of Scale

Kimi K2.7-Code is a Mixture-of-Experts model built for scale. It carries a total of 1 trillion parameters, though only 32 billion are active per token. The architecture relies on 384 experts, selecting eight per token alongside one shared expert. The stack includes 61 layers, with one dedicated dense layer, and employs MLA for attention and SwiGLU for the feed-forward path. To handle visual data, it integrates a MoonViT vision encoder adding 400 million parameters for image and video inputs. Native INT4 quantization is included, and the context window spans 256,000 tokens.

Operational constraints are strict. Thinking mode is mandatory; attempting to disable it triggers an API error. Sampling parameters are locked to a temperature of 1.0, top_p of 0.95, n of 1, and penalties of 0.0. The default maximum output is capped at 32,768 tokens. Deployment is server-class only, with the Hugging Face repository occupying roughly 595 GB on disk.

Performance Against the Field

Moonshot released six benchmark rows comparing K2.7-Code against K2.6, GPT-5.5, and Claude Opus 4.8. The new model outperforms its predecessor on every metric. The most significant leap is on the Kimi Code Bench v2, rising from 50.9 to 62.0.

BenchmarkKimi K2.6Kimi K2.7-CodeGPT-5.5Claude Opus 4.8K2.7 vs K2.6
Kimi Code Bench v250.962.069.067.4+21.8%
Program Bench48.353.669.163.8+11.0%
MLS Bench Lite26.735.135.542.8+31.5%
Kimi Claw 24/7 Bench42.946.952.850.4+9.3%
MCP Atlas69.476.079.481.3+9.5%
MCP Mark Verified72.881.192.976.4+11.4%

Notably, K2.7-Code surpasses Opus 4.8 on the MCP Mark Verified benchmark, scoring 81.1 against 76.4. It also edges close to GPT-5.5 on the MLS Bench Lite. The comparison environment was consistent: K2.7-Code ran in Kimi Code CLI, GPT-5.5 in Codex xhigh, and Opus 4.8 in Claude Code xhigh.

Reasoning Efficiency as a Cost Driver

Moonshot reports a roughly 30% reduction in reasoning-token usage compared to K2.6, framing this as ‘less overthinking.’ In agentic coding, where runs involve hundreds of steps, every plan, retry, and verification incurs a thinking cost. A 30% efficiency gain compounds significantly over long sessions.

This efficiency manifests in three areas. First, it lowers the output-token cost per task. Second, it accelerates steps, improving the responsiveness of interactive CLI sessions. Third, it extends the number of steps possible before hitting context limits.

Practical Use Cases

  • Repo-scale refactors: The agent targets failing test suites, reading files, editing across modules, and rerunning tests until the suite passes.
  • Code review: By feeding a pull request diff and requesting risk analysis, the 256K window accommodates large diffs, logs, and related files simultaneously.
  • MCP tool-use workflows: Scoring 81.1 on MCP Mark Verified, the model tests correct tool invocation via the Model Context Protocol, suitable for CI checks, ticket updates, and file edits in a single loop.
  • Long-context analysis: Accepting text, image, and video input allows documentation, screenshots, and recorded repros to be shared in one prompt.

Implementation and Quickstart

The Kimi API is OpenAI-compatible, using the model string kimi-k2.7-code. Users must not override fixed sampling parameters, or the request will error.

import os
from openai import OpenAI

# Base URL and key per the Kimi API docs at platform.moonshot.ai
client = OpenAI(
    api_key=os.environ.get("MOONSHOT_API_KEY"),
    base_url="https://api.moonshot.ai/v1",
)

messages = [
    {"role": "system", "content": "You are a coding agent."},
    {"role": "user", "content": "Refactor utils.py to remove duplicate code."},
]

resp = client.chat.completions.create(
    model="kimi-k2.7-code",
    messages=messages,
    max_tokens=32768,  # default cap; also the maximum
    # thinking is enabled by default and cannot be disabled.
    # temperature (1.0), top_p (0.95), n (1), and penalties (0.0) are
    # fixed server-side. Passing any other value returns an error.
)

msg = resp.choices[0].message
print(msg.content)

# Multi-step tool calls: append the full assistant message so that
# reasoning_content is preserved. Dropping it errors on the next turn.
# messages.append(msg.model_dump())

Documentation dictates two tool-use rules. Keep reasoning_content from the current turn in context. Additionally, set tool_choice to only "auto" or "none".

Market Position and Constraints

ModelLicenseParamsContextAPI price (in / out per 1M)
Kimi K2.7-CodeModified MIT (open)1T total / 32B active256K$0.95 / $4.00
Kimi K2.6Open-weight1T-class MoE256K~$0.67–0.95 / ~$3.39–4.00
GPT-5.5ClosedNot disclosedNot in Moonshot table
Claude Opus 4.8ClosedNot disclosed1M$5.00 / $25.00
Qwen3-Coder-480B-A35BOpen (Qwen license)480B / 35B active256K nativeVaries by host

Note: K2.7-Code lists $0.19 per 1M for cached input.

Strengths and Weaknesses

Strengths

  • Open weights under Modified MIT, offering a viable self-host path.
  • Broad, consistent gains over K2.6 on coding and agent evaluations.
  • Low API pricing relative to closed frontier models.
  • Beats Opus 4.8 on the MCP Mark Verified benchmark (company-reported).

Weaknesses

  • All headline numbers are first-party at launch.
  • Thinking mode cannot be disabled.
  • Sampling controls are locked to fixed values.Source Read original →

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top