Skip to main content
The feed
STRATEGY2026.06.115 min read

The Agent Test

One question separates agent projects worth building from the ones Gartner expects to be cancelled. Most teams cannot answer it. Here is the test we use.

Hamad Pervaiz
Hamad Pervaiz
Founder & CEO, BearPlex
Share

Gartner went looking for real agentic AI vendors and found about 130 of them. Not 130 categories. Not 130 percent growth. About 130 actual companies, out of the thousands currently selling something with the word agent on it. The rest are doing what Gartner calls agent washing. The label changed. The product did not.

The same Gartner research expects over 40 percent of agentic AI projects to be cancelled by the end of 2027. Two in five, dead before they ship anything that matters.

I run an AI engineering studio, and I think agents are the most interesting engineering problem of this decade. So read what follows as the opposite of a rant. Those Gartner numbers do not offend me. They describe the projects we decline.

There is one question that separates the agent projects worth a quarter of your roadmap from the ones headed for Gartner's cancellation column. I ask it before any agent work starts, and I would want every CTO to ask it before signing anything.

The Test

What does this agent do that a workflow plus an LLM call does not?

That is the whole test. It has to be answered concretely, in a sentence or two, by the person proposing the build. Not with adjectives. With behavior.

Be precise about what the alternative is, because it is stronger than most people give it credit for. A workflow plus an LLM call means deterministic steps, with a model invoked at the nodes where language or judgment is needed. Extraction here. Classification there. A drafted reply at the end, gated by a human before anything irreversible happens. Retries in code. Fallbacks in code. Logs you can actually read. You can build it in weeks, test it like normal software, and explain to a regulator exactly what it will and will not do.

Most of what gets pitched as an agent today is exactly this, with a loop drawn around the diagram. The model is consulted at fixed points. The path through the work never changes. Nothing decides anything that was not already decided at design time. If the autonomy in your agent can be expressed as an if statement, it is not autonomy. It is an if statement.

And that is fine. A workflow is not a consolation prize. It ships faster, fails more legibly, and costs far less to run. The mistake is not building one. The mistake is building one inside an agent framework, with agent complexity, agent failure modes, and an agent invoice.

What Earns the Word

The test is a filter, not a wall. Some answers pass it, and when they do, an agent is the only honest architecture.

An agent earns the word when the path through the task cannot be enumerated in advance:

  • The next step depends on what the system just discovered, and the branching is too rich to draw. Think of investigating a production incident, or reconciling records across systems that disagree in ways nobody catalogued.
  • The system has to choose which tool to reach for, not just which parameters to pass to a fixed one.
  • It has to recover from failures nobody predicted, not just retry the ones somebody did.
  • The environment pushes back. Each action changes what the right next action is.

If you can draw the flowchart, build the flowchart.

Concrete passing answers sound like this: the task tree is different for every input and only visible one step at a time. Or: the plan made at step one is routinely invalidated by what step three finds, and the system has to keep going anyway. Failing answers sound like this: it will be more flexible. It will learn over time. The demo was incredible.

One more marker worth writing down. A workflow that handles the common cases and routes the rest to a human will usually beat a fully autonomous system that handles everything unevenly. If a human checkpoint would not embarrass your business case, you did not need an agent. You needed software.

Why Everyone Says Agent

If the test is this simple, why does anyone fail it? Because every incentive in the room points the other way.

Budgets flow toward the word. Boards ask about it by name. Vendors know the same product commands different numbers priced as a workflow tool versus an agent platform. Internal teams know an agent initiative gets headcount and a keynote slide, while an automation cleanup gets neither. Nobody in that chain is lying, exactly. They are rounding up. Thousands of vendors rounded up, and Gartner counted about 130 that did not need to.

The demand side is sprinting just as hard. Deloitte found 74 percent of organizations expect at least moderate AI agent use by 2027, while only 21 percent have a mature governance model for agentic AI. When appetite runs that far ahead of discipline, the market will happily sell you appetite.

The test also has a sibling, and it deserves a sentence here: what does building this give us that buying it does not? MIT NANDA found externally purchased AI tools reached deployment about 67 percent of the time, against roughly 33 percent for internal builds. If someone already sells a tool that passes the agent test for your problem, buying it is not a defeat. Pride is not an architecture.

The Free Question

We wrote this exact check into the AI Readiness Audit, one of the 48 checks the audit puts in front of engineering leaders, because it is the cheapest item on the list with the most expensive failure mode behind it. Like everything in the reports library, every statistic in the audit was re-verified against primary sources in June 2026. The numbers above are the ones that pushed me to write this essay.

The test disciplines us as much as anyone. BearPlex deploys through 90-day War Rooms: cross-functional pods embedded with the client, shipping to production inside the window. Ninety days does not forgive a wrong architecture chosen in week one. If we let a workflow problem wear an agent costume, we are the ones standing in the wreckage on day 60. The filter protects the client's quarter and ours.

So we apply it early. Discovery at BearPlex is not billed, and the first conversation is with an engineer, not an account manager. That engineer will ask you the question in this essay, and no is a real possible outcome. I am proud of how often we say it. For some companies, a clear no on a doomed agent build is the most valuable thing any vendor will give them this year.

If there is a concrete answer, we will build the agent, and we will build it properly, with evaluation and guardrails designed in from day one. If there is not, you just saved a budget, a quarter, and one very uncomfortable board meeting.

The question is free. The quarter is not.

Filed under strategy · 2026.06.11
Share
From reading to building

If this maps to a decision you are making, talk to us.

The systems described in the feed are the systems we ship. The first conversation is with an engineer, not an account manager.