Which framework should I use to build production agents?

LangGraph is our default for stateful production agents: explicit state management, checkpointing, human-in-the-loop. Claude Agent SDK is excellent if you're building primarily with Anthropic models. CrewAI is strong for multi-agent coordination when you actually need it. For simple cases, raw API calls without a framework are perfectly fine: frameworks add value when state management gets non-trivial.

How much does running an agent cost?

Highly variable. A simple 3-step agent with frontier model: $0.05-$0.50 per execution. Complex multi-step agents with extensive tool use: $1-$10+ per execution. Multi-agent systems can hit $20-$100+ per execution. Cost optimization comes from: smaller models for routing, prompt caching, smarter tool selection, and explicit step limits.

Are agents reliable enough for production?

For appropriately scoped tasks with proper engineering, yes. The pattern that works: narrow scope, comprehensive evaluation, human checkpoints on consequential actions, observability throughout, explicit step and cost limits. The pattern that fails: unbounded autonomy on critical workflows without testing or oversight.

How do I evaluate an agent?

Build a golden dataset of multi-step task trajectories curated by domain experts: input goal, expected sequence of actions, expected final state. Run candidate agent versions through the dataset; measure task completion rate, action accuracy, cost per task, and latency. LLM-as-judge for nuanced quality. Sample human review for calibration. Critically: track these metrics over time as model versions and prompts change.

Start a conversation

AI engineering glossary

What is an Agent (in AI)?

An agent is an AI system that perceives its environment, makes decisions, and takes actions to achieve goals, typically by using tools, executing multi-step plans, and adapting based on feedback. In modern LLM context, an agent is a language model with the ability to call tools, maintain state across steps, and iterate until a goal is reached, rather than producing a single one-shot response.

Last updated 2026-04-28BearPlex AI Engineering Team

Overview

The word 'agent' has decades of history in AI (going back to the symbolic AI era of the 1980s), but its modern usage almost always refers to LLM-powered agents: systems where a language model orchestrates a workflow by calling tools, querying knowledge sources, and reasoning across multiple steps. The shift from 'chatbot' (single-turn Q&A) to 'agent' (multi-step goal-pursuit) is the defining production AI transition of 2024-2026. Successful production agents share three properties: explicit state management (so failures are recoverable), tool calling with validation (so the model can interact with real systems), and human checkpoints (so consequential actions get reviewed before execution).

How modern LLM agents work

Five core components: (1) The LLM, typically a frontier model like Claude Sonnet 4.5, GPT-5, or fine-tuned open-source. (2) Tools: functions the model can invoke (search the web, query a database, call an API, execute code). The LLM sees a list of tools, picks one, generates arguments, and the framework executes the call. (3) Memory: short-term (the conversation history) and long-term (a knowledge base or vector store the agent retrieves from). (4) Planning: the agent decides what to do next based on the goal, prior actions, and observations. ReAct (Reason + Act) is the canonical pattern. (5) State management: the framework (LangGraph, CrewAI, the Claude Agent SDK) tracks where the agent is in its workflow, enabling pause/resume, checkpointing, and human review.

Single-agent vs multi-agent

Single-agent systems have one LLM orchestrating everything: simpler to build, easier to debug, often the right starting point. Multi-agent systems coordinate multiple specialized agents (one for research, one for synthesis, one for review). Multi-agent looks impressive in demos but adds complexity: agents need to communicate, conflicts need resolution, and debugging gets harder. We default to single-agent with specialized tools rather than multi-agent with specialized agents: same capability, half the operational complexity.

When agents are the right architecture

Agents are the right tool when: (1) the task requires multiple steps with conditional branching (not just transformation of input → output), (2) the task requires interaction with real systems via APIs or tools, (3) the workflow can't be fully scripted in advance because it depends on intermediate observations. Agents are the wrong tool when: a simple prompt or RAG would work (cheaper, faster, easier to debug), the task is stateless and atomic, or you can't tolerate the unpredictability of LLM-driven decisions in your domain.

Use cases

Customer support agents that route, retrieve, and resolve complex multi-step issues
Research and analysis agents that gather information from multiple sources and synthesize findings
Code generation agents that read codebases, propose changes, and run tests iteratively
Data processing agents that classify, route, and transform documents through workflows
Sales and outreach agents that qualify leads, draft emails, and schedule meetings
DevOps and SRE agents for incident response, log analysis, and runbook execution

Examples in production

Anthropic Claude with Computer Use

Anthropic's computer use API enables Claude to navigate screens, click buttons, and operate software interfaces: agentic browser/desktop automation as a first-class capability.

Source

OpenAI Operator

OpenAI's Operator is a consumer-facing autonomous agent that browses the web, fills forms, and completes tasks on behalf of the user.

Source

GitHub Copilot Workspace

GitHub Copilot Workspace operates as an agent across the development lifecycle: understanding issues, planning fixes, generating code, and running tests.

Source

BearPlex agentic logistics deployment (2025)

BearPlex deployed 47 autonomous agents for a Fortune 100 logistics company in 90 days, generating $14M annualized cost savings. Documented in BearPlex case studies.

Source

Agent compared to alternatives

Alternative	Choose Agent when	Choose alternative when
Chatbot Single-turn LLM that responds to messages without maintaining state or calling tools	Agent when the task requires multiple steps, tool calls, or interaction with external systems.	Chatbot when the task is conversational Q&A: cheaper, faster, easier to operate.
Workflow / pipeline Predefined sequence of steps with deterministic logic between them	Agent when the steps depend on intermediate observations and can't be fully scripted in advance.	Workflow when the steps are well-defined and predictable: more reliable, easier to debug, often dramatically cheaper.
RAG (Retrieval Augmented Generation) Retrieve relevant documents at query time and inject into the LLM's context	Agent when retrieval is one of many actions the model needs to take, alongside tool use, planning, and iteration.	RAG when the task is essentially 'find documents and answer based on them': single-shot, no iteration needed.

Common pitfalls

Building multi-agent systems before single-agent is exhausted: most 'multi-agent' setups would be simpler and more reliable as single agents with multiple tools.
Skipping evaluation: agents fail in interesting ways. Without a golden dataset of multi-step trajectories, you have no way to detect when agent quality degrades.
Letting the agent run unbounded: production agents need step limits, cost limits, and explicit kill-switches. Runaway agents can rack up thousands of dollars in API calls.
No human checkpoints on consequential actions: send-an-email and write-to-database should require human review or have undo paths. Pure autonomy on consequential actions is malpractice.
Tool design failures: vague tool descriptions, missing input validation, and inconsistent error handling cause cascading agent failures. Tool design quality dominates agent quality.

Related BearPlex services

Autonomous AI Agents

Full AI glossary

FAQ

Questions about Agent.

An LLM is a language model that produces text in response to input. An agent is an LLM-powered system that uses tools, maintains state, and takes multi-step actions to achieve goals. Every agent contains an LLM; not every LLM is an agent. The distinction matters because agents have different operational needs (state management, monitoring, cost controls) than simple LLM API calls.

Need help implementing Agent?

BearPlex builds production AI systems that use Agent for Fortune 500s and high-growth scale-ups. Outcome-based pricing. 90-day embedded sprints.

Talk to BearPlex See case studies

What is an Agent (in AI)?

Overview

How modern LLM agents work

Single-agent vs multi-agent

When agents are the right architecture

Use cases

Examples in production

Anthropic Claude with Computer Use

OpenAI Operator

GitHub Copilot Workspace

BearPlex agentic logistics deployment (2025)

Agent compared to alternatives

Common pitfalls

Related terms

Related BearPlex services

Questions about Agent.

Related reading

Need help implementing Agent?