Skip to main content
All model briefs
2026.07.03Frontier LLM
10 min read

Claude Sonnet 4.5Frontier LLM

The model that defined the production-agent price point, and the engineering case for pinning it or moving up now that it sits in Anthropic's legacy tier.

Hamad Pervaiz
Hamad Pervaiz
Founder & CEO, BearPlex
Share
Reference
Parameters
Undisclosed
Base model
-
License
Proprietary (API)
Publisher
Anthropic
Paper date
2025.09.29

Claude Sonnet 4.5 is the model a large share of today's production agents were built on, and that is exactly why it deserves a sober brief in mid-2026. It now sits in Anthropic's legacy tier: still fully available, still not deprecated, but with two successors above it. The engineering question is no longer "is Sonnet 4.5 good," it is "do we pin it, or do we take the migration." This brief gives you the verified numbers for both sides of that decision.

What it actually is

Claude Sonnet 4.5 shipped on September 29, 2025, and Anthropic's launch positioning was unusually direct: "the best coding model in the world" and "the strongest model for building complex agents." The self-reported launch numbers were 77.2% on SWE-bench Verified and 61.4% on OSWorld (up from Sonnet 4's 42.2% four months earlier), and Anthropic reported observing it "maintaining focus for more than 30 hours on complex, multi-step tasks." Treat all of those as vendor-reported; the durable claim they support is the design intent, which the market then validated: Sonnet 4.5 became the default engine for long-horizon agent loops.

The current, verified spec sheet from the official model docs, as of July 2026: API ID claude-sonnet-4-5-20250929 (a pinned snapshot), 200K-token context window, 64K max output tokens, extended thinking supported, reliable knowledge cutoff of January 2025 (training data through July 2025). It is available on the Claude API, Amazon Bedrock, and Google Cloud, and the 4.5 generation is where Bedrock's global-versus-regional endpoint split begins.

One under-appreciated production feature: context awareness. Per the context-window docs, the API injects Sonnet 4.5's remaining token budget into the system prompt and updates it after tool calls. The model paces long tasks against real remaining capacity instead of guessing, which is one concrete reason its long-horizon agent behavior holds up.

Commercial terms

Hosted API under Anthropic's commercial terms; there are no weights. The platform-risk picture is the mirror image of OpenAI's: Anthropic's published record moves slower. As of July 2026, Sonnet 4.5 is a legacy model but not deprecated; in the current lineup only Opus 4.1 carries a deprecation notice. Model IDs are pinned snapshots, so behavior does not shift under you. What you are pricing in is not sudden retirement but gradual staleness: a January 2025 knowledge cutoff gets more expensive to compensate for every quarter.

Real API cost

Verified against the official pricing page as of July 2026, per million tokens:

  • Base: $3 input / $15 output.
  • Prompt caching: 5-minute cache writes $3.75 (1.25x), 1-hour writes $6 (2x), cache reads $0.30 (0.1x).
  • Batch API: $1.50 / $7.50 (a flat 50% off).
  • Cloud endpoints: regional or multi-region routing on Bedrock and Google Cloud adds a 10% premium over global endpoints, a 4.5-generation change worth catching in cloud cost reviews.

The cache-read rate is the whole economics of agent loops at this tier. A production agent re-sends its system prompt, tool definitions, and history on every step; with a stable prefix, the bulk of that input bills at $0.30 instead of $3. In our model engineering work, getting cache discipline right on a Sonnet-class agent routinely moves input cost by high double-digit percentages, which is more than most model-swap decisions move it.

Context economics, and the successor math

Sonnet 4.5's 200K window is the honest constraint in 2026. The 1M-token context window is a Sonnet 4.6 and Sonnet 5 feature, and on those models it is the default at standard pricing per the long-context pricing docs. Server-side compaction, Anthropic's managed answer to conversations that outgrow the window, is also a 4.6-and-later feature. On Sonnet 4.5 you manage context yourself: retrieval, summarization checkpoints, and tool-result pruning. The one mercy is graceful overflow, since the 4.5 generation returns a model_context_window_exceeded stop reason rather than hard-failing the request.

Now the migration math, which is where this brief earns its keep. The successors are Sonnet 4.6 ($3/$15, 1M context, 128K output, same tokenizer) and Sonnet 5 (1M context, 128K output, and introductory pricing of $2/$10 through August 31, 2026, returning to $3/$15 from September 1). The intro price looks like a straight 33% cut. It is not, for one verified reason: per Anthropic's pricing docs, Sonnet 5 uses a newer tokenizer that produces roughly 30% more tokens for the same text, while Sonnet 4.6 and earlier keep the old tokenizer. Through August, those two effects roughly cancel and Sonnet 5 is approximately cost-neutral against 4.5 on the same workload. From September 1, the same workload on Sonnet 5 costs on the order of 30% more than it did on Sonnet 4.5, unless the newer model's quality lets you cut tokens elsewhere. Sonnet 4.6 is the migration that buys the 1M window and compaction with no tokenizer inflation and no price change.

When to use it, and when not

Pin Claude Sonnet 4.5 when:

  • You have a validated production agent already running on it. A pinned snapshot that passes your evals is an asset; do not migrate on marketing cadence.
  • Your workload fits comfortably in 200K with good context hygiene, and your token bill is tokenizer-sensitive.
  • You need the proven cost/latency point: "Fast" latency class at $3/$15 with 0.1x cache reads is still, in July 2026, one of the best-understood price/performance positions in the market.

Do not choose it when:

  • You are starting a new build. Start evals at Sonnet 4.6 and Sonnet 5; only land on 4.5 if your task evals say so, which occasionally they do.
  • Your documents or agent sessions genuinely need the 1M window or server-side compaction rather than better retrieval. See our RAG vs long-context discussion before deciding they do.
  • A January 2025 knowledge cutoff is a real liability for your domain and you are not grounding with retrieval or search.

How we would architect it for a client

  1. Pin, but instrument. claude-sonnet-4-5-20250929 in one config file, golden-set evals in CI, and a standing quarterly bake-off against the current Sonnet line so the migration decision is data, not vibes.
  2. Cache-first prompt architecture. Stable system prompt and tool definitions at the prefix, volatile context at the tail, 1-hour cache for long-running agents. This is the highest-ROI engineering hour on any Sonnet-class deployment.
  3. Context discipline over context size. Tool-result pruning, summarization checkpoints around the 150K mark, and retrieval instead of window-stuffing. Teams that build this muscle on 200K get materially cheaper 1M-window behavior if they later migrate, because a full million-token prompt is never the cheap path.
  4. Route by difficulty. Haiku 4.5 ($1/$5) underneath for classification and extraction volume, Sonnet 4.5 as the agent workhorse, Opus-class or a reasoning model above it for the rare hard cases, with the OpenAI comparison rerun on your own tasks each quarter.

Sonnet 4.5's brief is ultimately about discipline: it rewards teams that treat a good model as infrastructure to be measured and defended, and it quietly punishes teams that chase every successor by default.

Frequently asked

As of July 2026 it is fully available but sits in Anthropic's legacy-models tier. It is not deprecated: in the current lineup only Opus 4.1 carries a deprecation notice. The model ID claude-sonnet-4-5-20250929 is a pinned snapshot on the Claude API, Amazon Bedrock, and Google Cloud, so existing deployments keep running with stable behavior. The practical risk is staleness (a January 2025 knowledge cutoff), not sudden retirement.

Shipping frontier llm in production?

BearPlex engineers AI systems for regulated enterprises. If you're evaluating a model like Claude Sonnet 4.5 for production, we'd like to talk.