Introducing Proactive Agents.
Learn more

Technology & Research

Blog
/
Why MCP alone isn’t enough for reliable agent tool use

Why MCP alone isn’t enough for reliable agent tool use

April 14, 2026

AI agents deliver the most value when they can see and act across your entire tech stack, from refunding orders to rescheduling bookings to looking up loyalty points. Model Context Protocol (MCP) made that connectivity dramatically easier, giving teams a standard way to package and expose tools across their stack.

But connectivity is not the same as reliability. After working with dozens of enterprise customers, we've seen the same mistake play out repeatedly: teams treat MCP as a one-step solution, import a server as-is, and expect everything to work. In production, it doesn't. Bridging that gap requires an infrastructure layer purpose-built to curate, scope, and evaluate how tools are actually used.

The trap of importing an MCP server "as is"

Many teams initially try to expose their full tool stack through an MCP server, hoping the agent will successfully determine which ones are relevant for use. This makes sense in theory but becomes problematic in production. 

Imagine an MCP server with 30 to 40 endpoints: refunds, order lookup, rewards, offers, profile management, and more. On every turn, the agent has to choose from the entire tool stack, even though only a few are relevant to the user's intent. With such a large surface area, a few things happen:

  • Tool selection accuracy drops
  • Response latency and costs increase due to unnecessary API calls
  • Mistakes and tool call failures become difficult to trace and debug

The pattern we see across customers is the same: start with full tool access, experience accuracy and latency degradation under production traffic, then revert to manually curating tools per workflow. By relying on the MCP alone, its value erodes and customers are forced back into time-consuming processes. 

While it’s tempting to treat tool calling accuracy as a limitation of LLMs, it’s solvable through architectural design. Agents struggle to reason over a large, unbounded action space, and the more tools you expose, the harder the selection problem becomes. By constraining the scope of the selection process, the right choice becomes more obvious.

How Decagon uses MCP

At Decagon, we treat MCP as a registry of tools, not as the final interface between your agent and your systems.

Tool discovery as a critical curation step

Raw MCP schemas are designed for systems, not agents. When we introspect a customer's MCP server and surface its tools, we consistently find that the schemas are structured in ways that introduce ambiguity at inference time. A tool named “updateUser” might be perfectly legible to an engineer, but to an agent reasoning over intent, it's an underspecified action space. Does it update contact information? Loyalty status? Communication preferences? That ambiguity compounds under production traffic.

Decagon addresses this through schema refinement, splitting overly generic tools into narrower, intent-scoped definitions that reduce the cognitive load on the model. Guardrails are also layered on top of MCP definitions to enforce required fields, constrain enums, and align tool behavior with global guidance specified by customers. Critically, none of this touches the source MCP server. It's all tracked in Decagon's agent versioning layer, so customers get full ownership of refinement without fragmentation.

Accuracy from constrained scope, not better selection

The single biggest driver of tool-calling accuracy is not model capability; it's how much of the tool surface the agent has to reason over at any given moment. When an agent must choose from 30+ tools on every turn, selection degrades even when only two or three are relevant. The selection problem scales with surface area.

The fix is routing logic paired with explicit, intent-scoped tool sets. Decagon implements this through Agent Operating Procedures (AOPs). For each AOP, customers define a constrained toolset alongside the routing logic and guardrails that govern when and how it’s invoked. The agent never operates against a flat, global list but instead, sees only what's relevant to the intent it's handling. From there, the model either selects among the narrowed tools or directly calls a specific one if explicitly instructed. At that scope, the selection problem is tractable, and the model's reasoning is precise.

Reliability through ongoing evaluation

Getting tool selection right on day one isn't enough. Prompt changes, model swaps, and schema updates all affect tool-calling behavior in ways that aren't immediately visible. Without systematic test coverage at the tool level, regressions are silent, and in a customer-facing agent, silent regressions are expensive.

Every AOP ships with evaluations that assert two things: that the correct tool was selected for a given intent, and that the correct arguments were constructed. For example, evaluation models check that the right account ID, booking reference, or currency was referenced in the tool. When an AOP or tool definition is modified, the dependent tests are re-run automatically before the change is promoted. Through built-in evaluation, tools ingested through MCP are deployed reliably at scale. 

Connectivity is just the start

MCP solves a connectivity problem, but it doesn't solve the curation problem. Deciding which tools an agent should use, for which intents, and how to verify it's all working is where the real leverage is. The teams shipping reliable agents in production aren't the ones with the most integrations. They're the ones who treat tool selection as an engineering discipline: scoped, tested, and systematically improved over time.

Deliver the concierge experiences your customers deserve

Get a demo