OpenAI has started a limited preview of GPT-5.6, splitting the release into three specific tiers: Sol, Terra, and Luna. Access is currently restricted to a small group of trusted partners via the API and Codex, though broader availability in ChatGPT and the API is expected in the coming weeks. The company shared initial details with the US government before publishing the announcement.
In this article
This update is primarily structural. GPT-5.6 introduces a tiered model family, two new reasoning modes, and a heavier safety stack. The naming convention has shifted as well; the version number now indicates the generation, while the names denote durable capability tiers that can advance on their own schedules.
Tiered Models
Sol is the flagship model. OpenAI cites improvements in coding, biology, and cybersecurity. Terra aims to match GPT-5.5 performance while costing roughly half as much. Luna offers strong capability at OpenAI’s lowest price point.
Pricing is calculated per one million tokens. Sol costs $5 for input and $30 for output. Terra is priced at $2.50 for input and $15 for output. Luna is the cheapest option at $1 for input and $6 for output. Terra is approximately twice as cheap as GPT-5.5, while Sol matches the pricing of the previous generation.
Caching behaviour has also changed. Prompt caching now supports explicit cache breakpoints with a 30-minute minimum cache life. Cache writes cost 1.25 times the uncached input rate, while cache reads retain a 90% discount. OpenAI plans to run Sol on Cerebras hardware, targeting up to 750 tokens per second by July.
New Reasoning Modes
GPT-5.6 adds two controls for reasoning effort. The first is a new max setting. This gives the Sol model the most time to reason deeply. The second is ultra mode. Instead of a single model working alone, ultra uses subagents to split complex work and accelerate it.
The max setting deepens a single chain of reasoning. The ultra mode coordinates several workers on one task. Both options trade latency and cost for accuracy on long-horizon problems.
Benchmark Results
OpenAI shared a preview set of evaluations. Sol set a new state of the art on Terminal-Bench 2.1. This benchmark tests command-line workflows requiring planning, iteration, and tool coordination.
On Agent’s Last Exam, Sol was the only model to pass the halfway mark, reaching 50.9% in code mode. On GeneBench v1, Sol beat GPT-5.5 on long-horizon genomics analysis while using fewer tokens. On ExploitBench, OpenAI reports Sol was competitive with Mythos Preview using about one-third of the output tokens.
| Model / mode | Terminal-Bench 2.1 |
|---|---|
| GPT-5.6 Sol (ultra) | 91.91% |
| GPT-5.6 Sol (max) | 88.76% |
| Claude Mythos 5 | 88% |
| GPT-5.5 | 83.4% |
Use Cases
- Long-horizon coding agents: Sol’s Terminal-Bench gains suit multi-step CLI automation. An example is an agent that plans, edits files, runs tests, then iterates.
- High-volume production: Terra fits chat features and document processing at scale. An example is summarising thousands of support tickets each day at lower cost.
- Latency-sensitive apps: Luna suits autocomplete, routing, and simple extraction. An example is classifying inbound emails before a heavier model handles edge cases.
- Defensive security work: Sol targets vulnerability research and patching. An example is reviewing a codebase to find and fix a memory bug.
Strengths and Open Questions
Strengths
- Clear tiering across cost, speed, and intelligence
- New
ultrasubagent mode for complex, parallel work - Reported state-of-the-art on Terminal-Bench 2.1
- Token-efficiency gains on biology and cyber benchmarks
- A documented, layered safety stack
Open questions
- Access is limited to about 20 partners at preview
- Public benchmark detail is partial until general availability
- Safeguards may block some legitimate dual-use security work
- Pricing sits above some open-weight competitors like GLM-5.2
- Real-world latency for
maxandultrais not yet public



