Skip to main content
AI engineering glossary

What is Function Calling in LLMs?

Function calling is the LLM capability to generate structured JSON output that invokes external functions or APIs, enabling the model to read databases, call services, run computations, and take real-world actions instead of producing only natural-language text.

Last updated 2026-04-28BearPlex AI Engineering Team

Overview

Function calling (also called tool use, tool calling, or in some APIs JSON mode) is the bridge between LLM reasoning and the rest of your system. OpenAI introduced it in June 2023; Anthropic, Google, and most other providers followed within months. Function calling is the foundation of every modern AI agent: instead of the LLM only producing text, it produces a structured request like 'call get_weather with city=Boston', your code executes that call, and the result is fed back into the model's next turn. Reliability of function calling is now a key differentiator between frontier models, and a critical engineering decision for production agent systems.

How function calling works

You define a set of functions in your API request: each with a name, description, and JSON schema for parameters. The model is also given the user's request. During inference, the model decides whether to respond with natural-language text or with a structured function call. If it picks a function call, it returns JSON specifying the function name and arguments. Your code parses the JSON, executes the actual function (calling a database, hitting an API, running a computation), and feeds the result back into the next model turn. The model then produces either a natural-language answer for the user or another function call. This loop continues until the model decides it has enough information to answer.

Function calling vs structured output vs MCP

Three closely related concepts often get conflated: (1) Structured output (JSON mode) forces the model to produce JSON matching a schema, but doesn't necessarily mean the JSON is a function call; it might be data extraction. (2) Function calling specifically means the model decides which function to invoke and with what arguments. (3) MCP (Model Context Protocol) is a standardized way to expose functions to LLMs across providers, eliminating the need to redefine function schemas for each model. In production we use all three: structured output for data extraction tasks, function calling for agent loops, and MCP servers when the same tools need to work across Claude, GPT, and Gemini.

Reliability and parallel function calling

Early function calling (2023) was unreliable: models would hallucinate function names, pass wrong argument types, or skip required parameters. Frontier models in 2026 reach 95%+ correctness on well-specified function schemas. The remaining failure modes are typically (1) ambiguous function descriptions where the model picks the wrong function, (2) chained calls where the model uses the result of call A incorrectly in call B, and (3) overcalling: invoking functions when the answer was already in context. Parallel function calling (the model emits multiple independent function calls in one turn for concurrent execution) became standard in 2024 and dramatically reduces agent latency for read-heavy tasks.

Use cases

  • Agent loops that read databases, call APIs, and take actions
  • Structured data extraction from natural language (parse an invoice, extract entities)
  • Routing decisions where the model picks which downstream service to call
  • Multi-step reasoning where the model breaks a task into discrete tool calls
  • Search-augmented chat where the model decides when to call search vs answer from memory

Examples in production

OpenAI

Introduced function calling in June 2023 with GPT-3.5-turbo and GPT-4, enabling structured tool invocation as a first-class API feature.

Source

Anthropic

Claude tool use (Anthropic's term for function calling) reached production GA in May 2024 with parallel tool use support.

Source

BearPlex (Letti AI)

Built a production agent system for Letti AI using function calling for booking actions, calendar lookups, and CRM writes: handling 10K+ daily agent loops.

Function Calling compared to alternatives

AlternativeChoose Function Calling whenChoose alternative when
Plain prompting
Asking the model for a structured answer in natural language and parsing the response
Use function calling for any production system that needs reliable structured outputPlain prompting only for one-off scripts or models without function calling support
Direct API calls (no LLM)
Hard-coded conditional logic that decides which function to call
Use function calling when the routing decision requires natural-language understandingUse direct logic when the inputs are structured and rules-based

Common pitfalls

  • Vague function descriptions: models pick the wrong function when descriptions don't clearly distinguish use cases
  • Missing JSON schema validation: the model can hallucinate fields not in the schema; always validate the call before executing
  • Long function lists: past ~20 functions, model accuracy degrades; group related functions or use a router model
  • Forgetting error handling: when a function fails, the model needs the error message to recover, not a silent failure
  • Trusting the model with destructive actions without human-in-the-loop confirmation
FAQ

Questions about Function Calling.

They're the same thing, named differently by different vendors. OpenAI, Google, and Mistral call it 'function calling.' Anthropic calls it 'tool use.' The mechanic is identical (the model emits structured JSON that invokes a function) and the code patterns translate directly between providers.

Frontier models in 2026 handle 50+ well-defined functions in a single API call without major accuracy degradation. Past that, you usually want a hierarchical structure: a router model that picks a category, then a specialist call with the relevant subset.

MCP is a protocol for defining tools once and exposing them to multiple LLM providers. Use MCP when (1) the same tools need to work across Claude, GPT, and Gemini, or (2) you want to share tools across applications. For single-provider single-application work, native function calling is fine and often faster to implement.

Yes: that's a real failure mode called hallucinated parameters. Frontier models in 2026 are mostly past this, but ALWAYS validate function arguments against your JSON schema before executing the call. Never let an unvalidated model output reach a database, API, or filesystem.

Work with BearPlex

Need help implementing Function Calling?

BearPlex builds production AI systems that use Function Calling for Fortune 500s and high-growth scale-ups. Outcome-based pricing. 90-day embedded sprints.