TerminalBench 91.9% · CTF 96.7% · Government preview limits · Cerebras 750 token/s
On June 26, 2026, OpenAI released the GPT-5.6 family—flagship Sol, balanced Terra, and lightweight Luna—marking the first solar-system naming scheme. Sol tops TerminalBench 2.1 at 91.9% and hits 96.7% on cybersecurity CTF evaluations. All three models crossed OpenAI’s High cybersecurity threshold. Due to a U.S. government security review, only about 20 vetted partner organizations can access the models today. This guide covers pricing and positioning, every major benchmark, Cerebras acceleration, the June policy fallout, a head-to-head with Claude Mythos 5, access timelines, use-case picks, safety architecture, and FAQ.
| Model | Positioning | Input Price | Output Price | Highlight |
|---|---|---|---|---|
| GPT-5.6 Sol | Flagship / maximum capability | $5 / 1M tokens | $30 / 1M tokens | TerminalBench 2.1 #1 (91.9%) |
| GPT-5.6 Terra | Balanced / workhorse | $2.50 / 1M tokens | $15 / 1M tokens | Near GPT-5.5 performance, 50% lower cost |
| GPT-5.6 Luna | Lightweight / fast | $1 / 1M tokens | $6 / 1M tokens | High-volume tasks, 80% cheaper than Sol |
Current status: Per U.S. government request, GPT-5.6 is limited to about 20 approved partner organizations. Broad availability is expected within weeks. Context window is reported at roughly 1.5M tokens (pending full system card confirmation).
OpenAI launched GPT-5.6 on June 26, 2026 with a new celestial naming system: Sol (the Sun) for the flagship, Terra (Earth) for the balanced tier, and Luna (the Moon) for the lightweight tier.
The rollout was not smooth. Following President Trump’s June 2 executive order, the White House coordinated the Office of Science and Technology Policy (OSTP) and the Office of the National Cyber Director (ONCD) to require a government security review before broad release. This is the first time the U.S. government has formally required an AI company to limit a frontier model launch. OpenAI CEO Sam Altman said the company would cooperate, but also pushed back publicly:
“We don’t believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them.”
Most users and enterprises cannot access GPT-5.6 through ChatGPT or the public API yet
June 2026 was meant to be a “super launch month,” but OpenAI, Anthropic, and Google all had flagship releases blocked or delayed
Limited preview means Agent workflows, Codex integration, and benchmark reproduction may wait weeks until July
Policy uncertainty adds hidden cost to model selection and budget planning
Teams should prepare a macOS dev environment that can validate new model capabilities the moment access opens
Sol is OpenAI’s most capable model to date, built for hard programming, long-horizon cybersecurity research, and multi-step agentic workflows.
Two new reasoning modes:
Pricing: $5 / 1M input tokens, $30 / 1M output tokens (same as GPT-5.5)
Terra is the enterprise workhorse for high-volume customer support, internal tools, and document analysis. Performance is close to GPT-5.5 at 50% lower cost—the best value for large-scale deployment. Pricing: $2.50 / 1M input, $15 / 1M output.
Luna targets high-frequency, low-latency tasks: summarization, drafting, and routine automation. Luna is also the first non-flagship OpenAI model to earn High ratings in both cybersecurity and biology. Pricing: $1 / 1M input, $6 / 1M output.
GPT-5.6 is the first OpenAI product line where all three tiers triggered OpenAI’s High cybersecurity risk classification.
TerminalBench 2.1 includes 89 complex command-line planning problems, testing multi-step tool use, iterative repair, and task coordination in realistic agent settings.
| Model | Score | Mode |
|---|---|---|
| GPT-5.6 Sol | 91.9% | Ultra (multi-agent) |
| GPT-5.6 Sol | 88.8% | Standard |
| Claude Mythos 5 | 88.0% | Standard |
| GPT-5.5 | 83.4% | Standard |
| Gemini 3.1 Pro Preview | 70.7% | Standard |
Sol displaced Claude Mythos 5 after only 17 days at the top—Mythos 5 had claimed #1 on June 9.
| Model | Task Completion Rate (Code Mode) |
|---|---|
| GPT-5.6 Sol | 50.9% (only model above 50%) |
| GPT-5.6 Luna | Slightly above GPT-5.5 |
| Model | CTF Hit Rate |
|---|---|
| Sol | 96.7% |
| Terra | 91.84% |
| Luna | 85.19% |
ExploitBench: Sol matches Anthropic’s Mythos Preview on ExploitBench while using only about one-third of the output tokens, cutting enterprise security research cost sharply.
Safety note: OpenAI testing shows Sol can identify vulnerabilities and exploit primitives in Chromium and Firefox codebases, but cannot autonomously construct complete, functional exploit chains. It remains below OpenAI’s “Cyber Critical” threshold.
Starting in July, GPT-5.6 Sol will deploy on Cerebras hardware for select enterprise customers, reaching up to 750 tokens per second.
For context, most frontier models today output between 50 and 150 tokens per second. At 750 token/s, response time could shrink to one-fifth or one-fifteenth of current latency—a meaningful shift for real-time coding assistants and streaming AI applications.
The executive order allows U.S. government agencies up to 30 days of pre-release access to review frontier AI models for national security. The order is not legally mandatory, but it produced real constraints on launch timing.
| Company | Model | Status |
|---|---|---|
| OpenAI | GPT-5.6 Sol / Terra / Luna | Limited preview (~20 partner orgs) |
| Anthropic | Claude Fable 5 / Mythos 5 | Forced offline June 12 via export control |
| Gemini 3.5 Pro | Delayed to July (originally June) |
June 2026 was supposed to be the biggest month in AI history. Instead, all three flagship releases were blocked at the door.
| Dimension | GPT-5.6 Sol | Claude Mythos 5 |
|---|---|---|
| TerminalBench 2.1 | 91.9% (Ultra) / 88.8% | 88.0% |
| ExploitBench | Near-identical to Mythos Preview, ~1/3 tokens | Data not public |
| Input price | $5 / M | $10 / M (currently offline) |
| Availability | Limited preview, general release within weeks | Offline due to export control |
| Context window | ~1.5M tokens | 200K tokens |
Bottom line: Sol leads on TerminalBench and delivers comparable security research capability at half the input price. Claude Fable 5 may still lead on SWE-Bench Pro and other dimensions; the full GPT-5.6 system card will clarify the picture once published.
Now (June 2026): About 20 government-vetted trusted partners via API and Codex only; ChatGPT users cannot access GPT-5.6 yet
Expected July 2026: ChatGPT general availability (Plus and Pro first), public API access
Cerebras Sol: Enterprise deployment at up to 750 token/s
Polymarket forecast: Traders assign roughly 87% probability that GPT-5.6 is broadly released by July 31, 2026
Full system card: Complete benchmark report expected at general release
| Your Need | Recommended Model |
|---|---|
| Complex code generation, debugging, multi-step agent tasks | Sol |
| Enterprise document analysis, support, high-volume API calls | Terra |
| Summarization, drafting, routine automation | Luna |
| Flagship capability on a tighter budget | Terra (GPT-5.5-level performance, 50% lower cost) |
| Latency-critical real-time apps (after July) | Sol on Cerebras |
GPT-5.6 represents OpenAI’s progress across three dimensions:
Capability: Sol’s Ultra multi-agent mode tops the global coding leaderboard, ending Claude Mythos 5’s 17-day reign
Efficiency: Comparable security research performance at roughly one-third the token cost of competitors
Speed: Cerebras deployment at 750 token/s in July will reshape real-time AI application boundaries
The release also sets a precedent: the U.S. government formally intervened in a frontier model launch for the first time. The balance between national security and open access will shape how AI models ship for years to come.
Because all three GPT-5.6 tiers crossed OpenAI’s High cybersecurity classification, safety was a primary engineering focus:
Red-teaming confirmed Sol cannot autonomously engineer a complete, functional exploit chain against hardened real-world targets. OpenAI’s Deployment Safety System Card documents the full evaluation methodology.
Pre-release intelligence roundup from June 2026.
Read →Cursor, Claude Code, Copilot, and Gemini buyer’s guide.
Read →Custom inference silicon and the cost race against Nvidia.
Read →Not yet for the general public. Currently limited to about 20 trusted partner organizations via API and Codex. Full ChatGPT rollout is expected within weeks, with Plus and Pro users first (July 2026).
Sol leads on TerminalBench 2.1 at 91.9% versus Claude Mythos 5 at 88.0%. Claude Fable 5 leads on SWE-Bench Pro, but official GPT-5.6 SWE-Bench scores have not been published yet. Sol is the better value—comparable or better performance at a lower price.
Ultra mode deploys multiple AI subagents that work in parallel on different parts of a task, then synthesize a unified result. It significantly boosts performance on complex tasks but uses considerably more tokens—best reserved for genuinely hard agent workflows.
The U.S. government, via the White House, OSTP, and ONCD, requested OpenAI limit access during a security review following President Trump’s June 2 executive order. OpenAI complied but publicly stated it opposes this becoming permanent practice.
Up to 750 tokens per second—roughly 5 to 15 times faster than most current frontier models (50 to 150 token/s). Launching July 2026 for select enterprise customers as Cerebras expands capacity.
Reported at approximately 1.5 million tokens, up from GPT-5.5’s 1 million token context. Official confirmation is expected with the full system card release.
All three carry OpenAI’s High cybersecurity risk rating—meaning significantly elevated capability in vulnerability research. OpenAI built layered safeguards including real-time classifiers and red-teaming, and confirmed the models cannot autonomously build complete functional exploits.
GPT-5.6 Sol’s Ultra multi-agent architecture and 91.9% TerminalBench score signal a new capability tier for Codex, OpenClaw, and other agent workflows. During the government preview window, most developers still cannot fully validate integrations that depend on Keychain, Xcode, and GUI debugging paths aligned with the Apple ecosystem from a Windows or Linux primary machine.
Renting a remote Mac avoids depreciation, sleep policies, and OS update risk on owned hardware while you keep API keys and repositories under your control. You work on a production-like macOS desktop to run GPT-5.6 Codex integrations and agent acceptance tests as soon as access opens. To prepare before general release, browse plans on VNCMac via the Mac rental pricing page or use the button below.
Sources: OpenAI official announcement, Deployment Safety System Card, VentureBeat, SiliconAngle, TechTimes. Data as of June 27, 2026.