The app works. That is the new part.
Built over a weekend with Claude Code or Codex, it signs users up, takes their money, sends the onboarding emails, renders a respectable dashboard. Two years ago that was a quarter of work for a funded team. Today it is one determined person, a long Saturday, and a tab full of prompts.
I think this is genuinely great. The tools are not the problem. Agentic coding has collapsed the distance between idea and artifact further than anything since the compiler. People who could never afford custom software now have it. Founders validate in days what used to consume a seed round. I run an engineering studio with 65+ engineers, so I am supposed to feel threatened by all of this. I do not. I feel like a structural engineer watching the whole county discover it can build its own Ferris wheels. Busy years ahead.
Because there is a pattern in what this wave produces: the apps are shippable, and they are not survivable. Those are different properties. Only one of them was ever in the prompt.
What the Prompt Never Asked For
Vibe-coding optimizes for the visible. The screen, the flow, the demo. That is exactly why it feels miraculous: everything you can point at works.
But most of what makes software survivable is invisible, and agentic tools, left unsupervised, skip it silently.
- Data model discipline. The schema grows by accretion, one prompt at a time. Each feature adds a column, a JSON blob, a duplicate source of truth. Nothing gets normalized because nobody asked.
- Authorization boundaries. Authentication is a solved prompt. Authorization is design work. Who can see whose records, and under which role? Vibe-coded apps reliably check that you are logged in and rarely check what you are allowed to touch.
- Observability. No structured logs, no traces, no alerts. When it breaks at 2am, the only debugging tool left is asking the model what it thinks it wrote.
- Tests. Either the model wrote them and the model graded them, or the demo was the test suite. Both fail the first time a change actually matters.
None of this is the tools' fault. A senior engineer driving Claude Code gets the schema, the policy layer, the traces, and the tests, because they ask for them. The tool answers the questions it is given. The weekend builder does not yet know which questions exist. That gap, between what got generated and what was never specified, is where the serious work of the next two years lives.
The Audit Arrives
For a while, none of it matters. Ten users, low stakes, the app glides. The invisible layers stay invisible right up until something good happens.
Then it does. A growth spike. A first enterprise customer with a security questionnaire. A procurement team asking about SOC 2. An investor running technical due diligence. Real users, real load, real auditors: three audiences the weekend app has never met, often arriving in the same quarter.
We already know how AI-built software does when reality starts grading it. S&P Global Market Intelligence found that 42% of companies abandoned most of their AI initiatives before production in 2025, up from 17% a year earlier. MIT's NANDA initiative found that about 95% of generative AI pilots show no measurable P&L impact. Those numbers describe enterprise projects with budgets, committees, and steering decks. The weekend app has none of that armor, and it hits the same wall, because the wall was never about resources. Shipping is not the hard part anymore. Surviving what you shipped is.
Audits are not kind even to the grown-ups. At Japan IT Week this spring, we ran live security diagnostics on 47 Japanese enterprises from our booth, and not one came back clean. Those were established companies with real engineering organizations. Now extend that curve to software whose entire provenance is a chat transcript.
Makeover, Not Makeup
The instinct, when one of these products starts creaking, is to order a debug pass. Fix the slow queries, patch the auth, sprinkle in some tests. I say no to that scope, and the no is the most useful thing I can offer.
A debug pass assumes the structure is sound and the defects are local. In a vibe-coded system the defects are the structure. You cannot patch your way from no authorization model to an authorization model. You design one, then rebuild around it. The same goes for the schema, the observability stack, and the test strategy. Makeup hides the problem until the next incident. A makeover replaces what the speed skipped.
The frame has three parts.
- Keep the product. The flows, the validated demand, the decisions users already voted for with their time and money. That work is real. It was bought with weekend speed, and it is the most valuable thing in the repository.
- Rebuild the skeleton. A schema designed on purpose. Authorization as an explicit policy layer instead of a scattering of if-statements. Logs, traces, and alerts wired in before scale demands them. Tests that encode intent, so the next prompt cannot quietly undo it.
- Redesign the face. Vibe-coded apps all wear the same face: the same component defaults, the same gradient hero, the same spacing drift. The surface is its own discipline, and it deserves the same deliberate treatment as the skeleton: a face designed on purpose, ending in a system, not a fresh coat of paint.
At BearPlex we run this work the way we run everything: War Rooms, cross-functional pods embedded for 90-day production deployments. The structure is the honesty. Rebuilding under live traffic is a commitment, not a ticket queue.
The Interesting Work
Here is the position that surprises people: I think this rebuild wave is the best engineering work available right now.
Greenfield is overrated. Starting from nothing mostly tests your taste in boilerplate. Rebuilding a live product with paying users, where you cannot stop the world and the previous architect was a language model with no memory of its own decisions, tests everything else: archaeology, judgment, sequencing, nerve. There has always been a word for engineers who can replace the skeleton without dropping the body. The word is senior.
The tools belong inside this work too. The same model that generated the slop, pointed at a real architecture by an engineer who knows what to ask for, becomes a formidable rebuilding instrument. The difference was never the model. It was the questions.
So if you shipped something fast and it is starting to creak, hear this clearly: you did it right. You validated before you invested, which is the correct order. The creaking is the signal that the investment is now due. Talk to us: the first conversation is with an engineer, not an account manager, and the diagnostic costs nothing. And if you would rather see how we think before talking to anyone, the six research reports in our library are public, every statistic in them re-verified against primary sources.
The weekend app was the opening act. The Makeover Era is the show, and we intend to be extremely busy.
