DeepSeek-V4: a million-token context that agents can actually use

Focusing on long-running agent workloads. Running a frontier open model as an agent today breaks in predictable ways-either the model stops, you reprompt, or the trace blows past the context budget due to tool-call round trips degrading halfway through a task. DeepSeek-V4 is designed to fix these known issues and pave the way for future development.

What makes long-context inference cheap

The architecture leverages two mechanisms: Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), which are interleaved across layers to ensure efficient attention operations. This hybrid approach reduces the overhead associated with maintaining a full context window, making it feasible for long-running agent tasks.

Agent-specific post-training decisions

Preserving reasoning content across tool calls

V4 now preserves the reasoning history across all tool call iterations. This allows an agent to maintain a coherent chain of thought over multiple tool interactions, which is crucial for multi-turn tasks where user messages are sent after several tool executions.

Tool-call schema with dedicated tokens

V4 introduces a new special token and an XML-based tool-call format. This separates string parameters from structured ones to reduce parsing errors that often occur when JSON tools call formats are used, leading to failures like escaping issues in nested quoted content.

DSec: the sandbox for RL rollouts

The paper describes DSec-a Rust platform that exposes multiple execution environments through a unified Python SDK. This setup is critical for training agents against various tool and environment setups, ensuring robust performance across diverse scenarios.

Benchmark results and agent performances

On the benchmark front, V4-Pro-Max shows competitive scores in key areas like terminal tasks, verification benchmarks, and internal R&D coding tests. These metrics suggest that DeepSeek-V4 is on par with leading closed models for agent-related workloads.

The survey of 85 developers using V4-Pro as their daily driver further underscores its potential: 52% indicated it was ready to replace their current primary model, and 39% leaned towards adopting it. These numbers highlight the practical utility of DeepSeek-V4 in real-world applications.

Using these models

deepseek-ai/DeepSeek-V4-Pro (1.6T / 49B activated, instruct): Uses FP4 for MoE expert weights and FP8 elsewhere. Recommended sampling parameters: temperature=1.0, top_p=1.0.
deepseek-ai/DeepSeek-V4-Flash (284B / 13B activated, instruct): Uses the same architecture but with fewer activations for faster inference.
deepseek-ai/DeepSeek-V4-Pro-Base (1.6T / 49B activated, base): Uses only FP8 throughout.
deepseek-ai/DeepSeek-V4-Flash-Base (284B / 13B activated, base): Same as V4-Pro but with fewer activations for faster inference.

The recommended reasoning modes are Non-think (fast), Think High (

<think>

blocks), and Think Max (maximum reasoning effort). For all modes, the sampling parameters remain temperature=1.0 and top_p=1.0.

The long-context retrieval performance is also highlighted in Figure 9 of the technical report, showing that V4-Pro-Max maintains high accuracy up to 256K tokens before slightly decreasing at 1M tokens but still holding above 0.82.

Key Takeaways

V4 addresses key limitations of previous models, enabling them to handle long-context queries efficiently and reliably.
The introduction of a new tool-call schema with dedicated tokens improves the robustness and reliability of agent interactions.
DSec provides a flexible infrastructure for training agents against diverse environments, supporting rapid development cycles.

Source Read original →

DeepSeek-V4: a million-token context that agents can actually use

DeepSeek-V4: a million-token context that agents can actually use

What makes long-context inference cheap

Agent-specific post-training decisions

Preserving reasoning content across tool calls

Tool-call schema with dedicated tokens

DSec: the sandbox for RL rollouts

Benchmark results and agent performances

Using these models

Key Takeaways

Empowering Businesses with AI: Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Ten advances in mathematics…

Judge denies xAI’s request…

YouTuber Hank Green says…

DeepSeek-V4: a million-token context that agents can actually use

What makes long-context inference cheap

Agent-specific post-training decisions

Preserving reasoning content across tool calls

Tool-call schema with dedicated tokens

DSec: the sandbox for RL rollouts

Benchmark results and agent performances

Using these models

Key Takeaways

Related articles

Empowering Businesses with AI: Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Ten advances in mathematics…

Judge denies xAI’s request…

YouTuber Hank Green says…