The Boring Levers: Why Scaling Failures Are Self-Inflicted

Eight percent.

That is the average CPU utilization across production Kubernetes clusters, measured by Cast AI across tens of thousands of them. Not at 4am on a holiday weekend. On average. Datadog's State of Cloud Costs research prices the same picture from another angle: 83% of container spend is tied to idle resources.

Read those two numbers again, because they describe a choice. Thousands of engineering teams built distributed systems for traffic that never arrived, and now they pay monthly rent on the waiting room. The clusters did not fail to scale. They scaled exactly as designed. They scaled the idle.

I have seen this enough times to call it what it is. Most scaling failures are self-inflicted. The database does not break because you grew. It breaks because the team skipped the boring work and went straight to the exciting kind.

Borrowed Problems

The pattern repeats with remarkable discipline. A SaaS platform with four thousand users adopts microservices, an event bus, a service mesh, and a multi-region Kubernetes footprint, because that is what serious companies run. Except the companies being copied adopted those things under duress, at a scale where nothing simpler survived. Copy the solution without the duress and you get all of the operational cost with none of the necessity.

Your platform with four thousand users does not have Google's problems. It has Google's YAML.

Premature architecture is a loan. You borrow complexity today and pay interest on it every on-call shift: more deployment paths, more failure modes, more dashboards, more 2am pages that can only explain themselves through distributed traces. The collateral is your roadmap, because every hour spent operating the big system is an hour not spent building the product that was supposed to need it.

The cruel part is that the bill still arrives at the database. Microservices do not rescue a Postgres instance with missing composite indexes and a cache hit rate nobody measures. They just make it harder to see which service is hammering it.

The Late Sharders

The two most instructive database scaling stories in SaaS belong to Notion and Figma, and people routinely take the wrong lesson from both.

Notion runs one of the most famous sharded Postgres fleets in the industry. Read their engineering team's own account of the migration, though, and notice what forced it. Not slow queries. VACUUM stalls and transaction ID wraparound risk, a Postgres failure mode that does not degrade gracefully: it halts every write on the system. When Notion moved, they moved well. Sharded by workspace ID, with 480 logical shards chosen precisely because 480 divides cleanly in ways a power of two never will.

Figma is even clearer. Their databases team grew the database stack almost 100x since 2020 before horizontally sharding at all. The runway came from unglamorous levers: caching, read replicas, and roughly a dozen vertically partitioned databases. When sharding finally became unavoidable, they shipped it in-house in about nine months.

Here is the part people miss. Notion's stated lesson from the whole experience was to shard earlier. That sounds like a vote against everything I just argued, until you notice what it actually means: once the symptoms have names, stop hesitating. It was not a regret about the sequence. The sequence was right. The regret was the pause at the end of it, after the system had already asked.

Both companies sharded late. Figma sharded once; Notion came back for a second pass, tripling its fleet from 32 to 96 instances, and by the team's own account that re-shard cost users at worst about a second of a saving spinner, because dark reads compared old and new databases before cutover. Because they waited, both sharded with years of real data about access patterns, tenancy boundaries, and hot spots. Shard on day one and you are guessing at your shard key. The shard key is the one decision in this whole discipline that is brutally expensive to take back.

The Levers, In Order

What does boring first actually look like? Roughly this, in roughly this order:

Indexes and query plans. Read your slow query log before you read another Kubernetes doc. Most "we need to scale" conversations end here, quietly.
Caching. Cache-aside with sensible TTLs covers most of it. It is the cheapest 10x in the entire playbook, and the easiest to get burned by if you never test cache-cold.
Read replicas. Most SaaS workloads are read-heavy. Cloning data is dramatically cheaper than splitting it.
Vertical partitioning. Move the noisiest tables onto their own hardware. Alongside caching and read replicas, this is what bought Figma its years.
Queues. Put a buffer between bursty producers and anything that falls over at peak, then scale the consumers, not the queue.
Rightsizing. Datadog finds most container workloads use less than a quarter of the CPU they request. Reclaim what you already pay for before buying more.

None of this is timid. Data Center Dynamics reported that Stack Overflow serves around 6,000 requests per second from nine on-prem web servers running a tuned monolith. Nine. Know what one well-run box can do before you architect for a fleet.

Every lever on that list shares the properties the exciting ones lack. It is reversible. It is cheap. Its failure modes are documented to death. An index takes an afternoon. A cache policy takes a week. A microservices migration takes a year, and you learn whether it worked at the end.

It is also the conversation I want to have first whenever someone brings us enterprise platform work. Not redesign. Measurement. Which queries, which tenants, which queue depths, which hit rates. The glamorous architecture conversation can wait until the numbers demand it.

Sequencing, Not Timidity

The obvious objection: if the big architecture will be needed eventually, why not build it now and skip the rewrite?

Because "eventually" is doing fraudulent work in that sentence. Most platforms never reach the scale where the boring levers run out. The ones that do arrive there with revenue, real load data, and a team that understands its own system, which is exactly the position you want to occupy when making expensive, irreversible decisions. Figma sharded from strength. Notion sharded with a clear forcing function and a measured plan. Neither was rescued by an architecture built years earlier on guesses.

Boring first does not mean boring forever. It means writing graduation criteria in advance and respecting them. When VACUUM falls behind, when replica lag becomes user-visible, when one tenant's traffic distorts everyone's p99, the system is asking. Answer it then, decisively, the way Notion wished they had. Until that day, every exciting component you do not run is an outage you do not have and a line item you do not pay.

We wrote the full sequence down. The SaaS Scalability Blueprint is 49 checks across six disciplines, built from what Notion, Figma, Shopify, Slack, and Amazon's own builders actually did, with every statistic re-verified against primary sources in June 2026. Run it as an audit. Each check you fail has a named company that already paid for the lesson, which means you do not have to.

And if you would rather walk through it with someone, our first conversation is with an engineer, not an account manager, and the diagnostic costs nothing. Bring your slow query log.

The teams that scale are not the ones that built for a million users on day one. They are the ones that kept the system boring long enough to earn the interesting problems.

Filed under engineering · 2026.06.11

The Boring Levers

Borrowed Problems

The Late Sharders

The Levers, In Order

Sequencing, Not Timidity

Related reading

If this maps to a decision you are making, talk to us.