Most GPT-5 coverage is a benchmark story. This brief is a procurement and architecture story, because that is what actually decides whether GPT-5 belongs in your stack. As of July 2026, "GPT-5" is not one model. It is a fast-moving platform: the original August 2025 release has already been deprecated, four point-releases have shipped and been retired behind it, and the current flagship is gpt-5.5. If you are evaluating OpenAI for a production system, the deprecation section of this brief matters at least as much as the capability section.
What it actually is
GPT-5 launched on August 7, 2025 (the retired API snapshot ID, gpt-5-2025-08-07, carries the date). The GPT-5 System Card described the consumer product as a unified system: a fast model for most questions, a deeper reasoning model for harder problems, and "a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent." The router was trained on live signals, including when users switched models and measured correctness, and OpenAI stated the plan was to eventually integrate everything into a single model.
That integration is essentially what the 5.x line delivered. The current flagship, gpt-5.5 (snapshot gpt-5.5-2026-04-23), exposes the routing decision to you as a parameter: reasoning_effort accepts none, low, medium (default), high, and xhigh. There is no hidden router in the API path. You choose, per request, how much thinking you buy. Specs that matter: a 1,050,000-token context window, 128,000 max output tokens, and a knowledge cutoff of December 1, 2025.
The engineering consequence of the router history is worth stating plainly: in the API, routing is your job. The consumer product made "which model, how much reasoning" an invisible platform decision; the API hands it back to you as the single biggest cost and latency lever you control.
Commercial terms
There is no license to negotiate in the open-weights sense; this is a hosted API under OpenAI's business terms. What you are actually agreeing to, structurally, is a dependency on OpenAI's model lifecycle. Their published policy commits to at least 6 months of notice before retiring generally available models and at least 3 months for specialized variants, per the deprecations page. Six months is the contractual floor you should plan around, because it is the number OpenAI plans around.
Real API cost
Verified against the official pricing page as of July 2026, per million tokens:
| Model | Input | Cached input | Output | |---|---|---|---| | gpt-5.5 | $5.00 | $0.50 | $30.00 | | gpt-5.5-pro | $30.00 | n/a | $180.00 | | gpt-5.4 | $2.50 | $0.25 | $15.00 | | gpt-5.4-mini | $0.75 | $0.075 | $4.50 | | gpt-5.4-nano | $0.20 | $0.02 | $1.25 |
Three modifiers change the real bill more than the headline rates:
- Batch is a flat 50% off (gpt-5.5 at $2.50/$15.00). Any workload that tolerates asynchronous turnaround should be there by default.
- Priority processing costs roughly 2.5x (gpt-5.5 at $12.50/$75.00). If your latency SLO forces the priority tier, your model is 2.5x more expensive than the number in your business case.
- Long context is a premium product. Prompts beyond 272K input tokens are billed at 2x input and 1.5x output on gpt-5.5. The million-token window exists, but the economics push you toward retrieval and context discipline rather than stuffing the window.
The cached-input rate (10% of base) is the quiet workhorse: agent systems that keep a stable prompt prefix routinely see the majority of their input tokens billed at that rate.
Tool use and eval behavior that matters
gpt-5.5's tool surface is broad and first-party: web search, file search, code interpreter, hosted shell, apply-patch, computer use, MCP, and structured outputs are all supported per the model page. For production agents, the details we test for in every model engineering evaluation:
- Reasoning effort is the reliability dial, not just a cost dial. Tool-selection and argument-construction errors drop as effort rises, and
noneturns the model into a fast, cheaper executor that is only appropriate for well-constrained tool schemas. Run your own task-level evals per effort level; the deltas are workload-specific and OpenAI publishes no per-effort reliability numbers. - Structured outputs are the contract. Schema-constrained generation is the single most effective control we know for keeping multi-step agents parseable at step forty.
- The platform tools create soft lock-in. Hosted shell, file search, and computer use are excellent, and every one you adopt makes the eventual portability conversation harder. Wrap them behind your own interfaces from day one.
Deprecation history as platform risk
This is the section buyers skip and regret. The verified record from OpenAI's own deprecations page, as of July 2026:
- The original GPT-5 (
gpt-5-2025-08-07), plus its mini and nano variants, was deprecated on June 11, 2026 and shuts down December 11, 2026. Sixteen months from launch to shutdown. - gpt-5-chat-latest, gpt-5-codex, and the entire 5.1 codex family were deprecated April 22, 2026 and shut down July 23, 2026.
- gpt-5.2-chat-latest and gpt-5.3-chat-latest shut down August 10, 2026. The 5.2 and 5.3 generations lived well under a year.
- Even the GPT-4 era finally ends:
gpt-4-0613,gpt-4-turbo, and the originalgpt-4osnapshot shut down October 23, 2026.
The pattern is consistent: OpenAI ships fast and retires fast, honoring the 6-month floor and rarely much more. Practical implications: pin snapshots, budget a re-evaluation cycle roughly every two quarters, keep your eval suite runnable on demand so a forced migration is a regression test rather than a research project, and never hardcode a model ID deeper than one config file. Prompt behavior shifts between point releases; the deprecation notice is also a behavior-change notice.
When to use it, and when not
Use the GPT-5 platform when:
- You want the broadest first-party tool ecosystem (hosted shell, computer use, MCP) with one vendor and one bill.
- Your workload spans wildly different difficulty levels and you can exploit
reasoning_effortplus the 5.4 mini/nano ladder to route cost. - Batch-eligible volume work dominates: $2.50/$15.00 for frontier-class capability is genuinely hard to beat.
Do not use it when:
- Data cannot leave your infrastructure boundary. There are no weights; an open-weights model is the answer to that constraint, not a different API vendor.
- Your organization cannot absorb a forced model migration every 12 to 18 months. The deprecation record above is the base rate, not a worst case.
- You need multi-year behavioral stability for a regulated, validated workflow; pinned snapshots still retire.
How we would architect it for a client
The same routing-gateway discipline we apply to open-weights deployments applies here, with vendor risk added to the design inputs:
- A model-agnostic gateway owns model IDs, so a deprecation is a config change plus an eval run, not a code change. Every prompt lives in version control with a golden-set eval attached.
- Effort-tiered routing: gpt-5.4-nano or mini at low effort for extraction and classification volume, gpt-5.5 at medium for standard agent steps, high or xhigh reserved for the requests that measurably need it. The comparison with Anthropic's lineup is rerun quarterly, because both vendors move.
- Caching and batch by default: stable prompt prefixes to exploit the $0.50 cached rate, and every non-interactive pipeline on the batch tier.
- A deprecation playbook, not a deprecation panic: subscribe to the deprecations feed, hold the previous snapshot and the new one in A/B during migration windows, and treat the 6-month notice as the start of a scheduled project.
GPT-5 is a strong default choice in mid-2026. It stays a strong choice only for teams that engineer for the platform's velocity instead of pretending it is infrastructure.
