Is Claude Sonnet 4.5 still available, or is it deprecated?

As of July 2026 it is fully available but sits in Anthropic's legacy-models tier. It is not deprecated: in the current lineup only Opus 4.1 carries a deprecation notice. The model ID claude-sonnet-4-5-20250929 is a pinned snapshot on the Claude API, Amazon Bedrock, and Google Cloud, so existing deployments keep running with stable behavior. The practical risk is staleness (a January 2025 knowledge cutoff), not sudden retirement.

What does Claude Sonnet 4.5 cost?

Verified against Anthropic's pricing docs as of July 2026: $3 per million input tokens and $15 per million output tokens. Prompt cache reads are $0.30 (10% of base input), 5-minute cache writes $3.75, 1-hour writes $6, and the Batch API halves everything to $1.50/$7.50. On Bedrock and Google Cloud, regional or multi-region endpoints add a 10% premium over global routing.

Does Claude Sonnet 4.5 have a 1M token context window?

No. Per Anthropic's current documentation it has a 200K-token context window and 64K max output. The 1M window, at standard pricing and with server-side compaction support, belongs to Claude Sonnet 4.6 and Sonnet 5. If your workload genuinely needs more than 200K, the answer is usually Sonnet 4.6 (same $3/$15 price, same tokenizer) rather than window-stuffing on 4.5, though better retrieval beats a bigger window more often than teams expect.

Should we migrate from Sonnet 4.5 to Sonnet 5?

Run the math before the upgrade reflex. Sonnet 5's introductory pricing is $2/$10 through August 31, 2026, but it uses a newer tokenizer that produces roughly 30% more tokens for the same text, so the intro period is closer to cost-neutral than it looks, and from September 1 the same workload costs about 30% more than on 4.5 at the restored $3/$15. Migrate for capability (1M context, compaction, higher output limits) if your evals show it pays, and consider Sonnet 4.6 as the no-inflation middle path.

Why did Sonnet 4.5 become the default model for production agents?

It hit a specific combination first: fast-class latency at $3/$15, strong tool use, extended thinking, and long-horizon stability (Anthropic reported it holding focus for over 30 hours on multi-step tasks at launch). It also has context awareness: the API injects the remaining token budget and updates it after each tool call, so the model paces work against real capacity. Those properties, plus 10x-cheaper cache reads on stable prompt prefixes, match exactly how agent loops spend money.

Are the launch benchmark numbers (77.2% SWE-bench Verified) trustworthy?

They are Anthropic's self-reported figures from the September 2025 announcement, and we treat all vendor-reported benchmarks the same way regardless of vendor: directionally useful, never decision-grade. Nine months on, the more relevant fact is deployment history, since a large base of production agents was built and validated on this model. For your decision, run task-level evals on your own workload; that is the standard we apply in every engagement.

Does BearPlex deploy Claude Sonnet 4.5 in client work?

Yes. It has been a standard workhorse lane in our agent architectures: cache-first prompt design, difficulty-based routing with Haiku 4.5 underneath, and pinned snapshots defended by golden-set evals in CI. For new builds we now start evaluations at Sonnet 4.6 and Sonnet 5, but we do not force-migrate healthy 4.5 deployments; a validated pinned model is an asset, and migration is a data decision, not a calendar one.

Claude Sonnet 4.5: The Engineering Decision Brief

Claude Sonnet 4.5 is the model a large share of today's production agents were built on, and that is exactly why it deserves a sober brief in mid-2026. It now sits in Anthropic's legacy tier: still fully available, still not deprecated, but with two successors above it. The engineering question is no longer "is Sonnet 4.5 good," it is "do we pin it, or do we take the migration." This brief gives you the verified numbers for both sides of that decision.

What it actually is

Claude Sonnet 4.5 shipped on September 29, 2025, and Anthropic's launch positioning was unusually direct: "the best coding model in the world" and "the strongest model for building complex agents." The self-reported launch numbers were 77.2% on SWE-bench Verified and 61.4% on OSWorld (up from Sonnet 4's 42.2% four months earlier), and Anthropic reported observing it "maintaining focus for more than 30 hours on complex, multi-step tasks." Treat all of those as vendor-reported; the durable claim they support is the design intent, which the market then validated: Sonnet 4.5 became the default engine for long-horizon agent loops.

The current, verified spec sheet from the official model docs, as of July 2026: API ID claude-sonnet-4-5-20250929 (a pinned snapshot), 200K-token context window, 64K max output tokens, extended thinking supported, reliable knowledge cutoff of January 2025 (training data through July 2025). It is available on the Claude API, Amazon Bedrock, and Google Cloud, and the 4.5 generation is where Bedrock's global-versus-regional endpoint split begins.

One under-appreciated production feature: context awareness. Per the context-window docs, the API injects Sonnet 4.5's remaining token budget into the system prompt and updates it after tool calls. The model paces long tasks against real remaining capacity instead of guessing, which is one concrete reason its long-horizon agent behavior holds up.

Commercial terms

Hosted API under Anthropic's commercial terms; there are no weights. The platform-risk picture is the mirror image of OpenAI's: Anthropic's published record moves slower. As of July 2026, Sonnet 4.5 is a legacy model but not deprecated; in the current lineup only Opus 4.1 carries a deprecation notice. Model IDs are pinned snapshots, so behavior does not shift under you. What you are pricing in is not sudden retirement but gradual staleness: a January 2025 knowledge cutoff gets more expensive to compensate for every quarter.

Real API cost

Verified against the official pricing page as of July 2026, per million tokens:

Base: $3 input / $15 output.
Prompt caching: 5-minute cache writes $3.75 (1.25x), 1-hour writes $6 (2x), cache reads $0.30 (0.1x).
Batch API: $1.50 / $7.50 (a flat 50% off).
Cloud endpoints: regional or multi-region routing on Bedrock and Google Cloud adds a 10% premium over global endpoints, a 4.5-generation change worth catching in cloud cost reviews.

The cache-read rate is the whole economics of agent loops at this tier. A production agent re-sends its system prompt, tool definitions, and history on every step; with a stable prefix, the bulk of that input bills at $0.30 instead of $3. In our model engineering work, getting cache discipline right on a Sonnet-class agent routinely moves input cost by high double-digit percentages, which is more than most model-swap decisions move it.

Context economics, and the successor math

Sonnet 4.5's 200K window is the honest constraint in 2026. The 1M-token context window is a Sonnet 4.6 and Sonnet 5 feature, and on those models it is the default at standard pricing per the long-context pricing docs. Server-side compaction, Anthropic's managed answer to conversations that outgrow the window, is also a 4.6-and-later feature. On Sonnet 4.5 you manage context yourself: retrieval, summarization checkpoints, and tool-result pruning. The one mercy is graceful overflow, since the 4.5 generation returns a model_context_window_exceeded stop reason rather than hard-failing the request.

Now the migration math, which is where this brief earns its keep. The successors are Sonnet 4.6 ($3/$15, 1M context, 128K output, same tokenizer) and Sonnet 5 (1M context, 128K output, and introductory pricing of $2/$10 through August 31, 2026, returning to $3/$15 from September 1). The intro price looks like a straight 33% cut. It is not, for one verified reason: per Anthropic's pricing docs, Sonnet 5 uses a newer tokenizer that produces roughly 30% more tokens for the same text, while Sonnet 4.6 and earlier keep the old tokenizer. Through August, those two effects roughly cancel and Sonnet 5 is approximately cost-neutral against 4.5 on the same workload. From September 1, the same workload on Sonnet 5 costs on the order of 30% more than it did on Sonnet 4.5, unless the newer model's quality lets you cut tokens elsewhere. Sonnet 4.6 is the migration that buys the 1M window and compaction with no tokenizer inflation and no price change.

When to use it, and when not

Pin Claude Sonnet 4.5 when:

You have a validated production agent already running on it. A pinned snapshot that passes your evals is an asset; do not migrate on marketing cadence.
Your workload fits comfortably in 200K with good context hygiene, and your token bill is tokenizer-sensitive.
You need the proven cost/latency point: "Fast" latency class at $3/$15 with 0.1x cache reads is still, in July 2026, one of the best-understood price/performance positions in the market.

Do not choose it when:

You are starting a new build. Start evals at Sonnet 4.6 and Sonnet 5; only land on 4.5 if your task evals say so, which occasionally they do.
Your documents or agent sessions genuinely need the 1M window or server-side compaction rather than better retrieval. See our RAG vs long-context discussion before deciding they do.
A January 2025 knowledge cutoff is a real liability for your domain and you are not grounding with retrieval or search.

How we would architect it for a client

Pin, but instrument. claude-sonnet-4-5-20250929 in one config file, golden-set evals in CI, and a standing quarterly bake-off against the current Sonnet line so the migration decision is data, not vibes.
Cache-first prompt architecture. Stable system prompt and tool definitions at the prefix, volatile context at the tail, 1-hour cache for long-running agents. This is the highest-ROI engineering hour on any Sonnet-class deployment.
Context discipline over context size. Tool-result pruning, summarization checkpoints around the 150K mark, and retrieval instead of window-stuffing. Teams that build this muscle on 200K get materially cheaper 1M-window behavior if they later migrate, because a full million-token prompt is never the cheap path.
Route by difficulty. Haiku 4.5 ($1/$5) underneath for classification and extraction volume, Sonnet 4.5 as the agent workhorse, Opus-class or a reasoning model above it for the rare hard cases, with the OpenAI comparison rerun on your own tasks each quarter.

Sonnet 4.5's brief is ultimately about discipline: it rewards teams that treat a good model as infrastructure to be measured and defended, and it quietly punishes teams that chase every successor by default.

Claude Sonnet 4.5Frontier LLM

What it actually is

Commercial terms

Real API cost

Context economics, and the successor math

When to use it, and when not

How we would architect it for a client

Frequently asked

Related work

Related reading

Shipping frontier llm in production?