Your MCP server probably has too many tools

Dev.to / 4/24/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep Analysis

Key Points

  • The author built Beacon, a Siren MCP server, and found that although it worked technically, it was practically unusable because the model repeatedly selected the wrong tool and provided incorrect confirmations.
  • The article explains that MCP tools function like a per-turn “menu” the model reads, so having many tools creates more tokens to process and more near-duplicate options, increasing the chance of wrong selections and hallucinations.
  • The author contrasts MCP with REST APIs: REST is built for developers who can infer context from docs and data models, while the model lacks that context and effectively “learns” everything from the exposed tool list.
  • The proposed solution is to dramatically reduce the MCP tool surface (e.g., Beacon to about three tools) and rely on aggressive parameterization so fewer tools cover more use cases.
  • The overall takeaway is that effective MCP design favors a small, tightly parameterized set of tools rather than one tool per REST-like endpoint or CRUD operation.

I built the first working version of Beacon, Siren's MCP server, in an evening while playing video games.

The thing I shipped that night was technically a functioning MCP server, and it was completely unusable. Claude could see the tools. Claude could list the tools. Claude would confidently pick the wrong tool every single time, call it with parameters that didn't match anything real, and then cheerfully lie to the user about what had happened. I'd built a very expensive random number generator and bolted it to my API.

Turns out, MCP servers are different than REST APIs for a reason, and I was in the middle of learning exactly why.

MCP !== REST

Here's the assumption I walked in with. I had a REST API. MCP tools are just things an agent can call. Therefore, one MCP tool per REST endpoint, right? Right? Siren has a lot of endpoints, so Beacon shipped with somewhere between 30 and 40 tools on day one. Every CRUD operation. Every specialized lookup. Every listing variation. All of it, exposed.

It hallucinated constantly, picked the wrong tools, and honestly just wasn't working.

What I didn't understand yet is that an MCP tool surface isn't an API. It's a menu the model reads every single turn. More items on the menu means more tokens burned on tool descriptions, more near-duplicate options for the model to pick between, and more opportunities for it to grab the almost-right one.

I imagine the experience for the AI agent was similar to what happened to me when I first opened Blender. I've been a long-time CAD user, but I'd never used Blender. I knew about 3D modeling, understood the concepts, but staring at a wall of unfamiliar tools that all seemed to do slightly different things overwhelmed me, and I had no idea what to reach for to complete my task.

That's basically what a large-surface MCP server looks like from the model's side.

The REST API is for developers who can read docs, hold a data model in their head, and know that GET /programs/{id}/distributors is different from GET /distributors?program_id={id} even though both technically work. The model doesn't have that context. It's picking from the menu, and the menu is its entire world.

The fix: fewer tools, aggressive parameterization

After the first version embarrassed itself in front of me enough times, I did some research. The pattern came up over and over: the MCPs people actually use in production have small surfaces. Like, shockingly small.

So I reimagined the whole thing. Beacon now has three tools. Maybe four, depending on how you count. Every other MCP I actually use day-to-day has a similarly tiny count. That's not an accident. That's the pattern.

The move is to parameterize aggressively instead of creating separate tools for variations. One search tool with a contentType parameter, not six search tools for each content type. One fetch tool that takes a list of IDs, not one per resource type. The tool doesn't map to a row in your endpoint list. It maps to a verb the user would actually say out loud.

It's not just the verbs, either. It's how the data comes back. By parameterizing the types and the fields the agent wants in the response (much like how GraphQL works), you help it put together token-efficient request and response cycles that return exactly what it needs and nothing more. That saves you money in tokens, and makes the agent more reliable in longer conversations where context window pressure compounds.

The difference was literally night and day. Before, I had an MCP server that was set up, configured, and completely ignored by every flagship agent I pointed at it. After, I had something I use every day for my own work, which is honestly the highest bar I know how to apply.

The test I use now, when I'm tempted to add a tool, is this: could I get the same behavior by adding a parameter to an existing tool? If yes, I add the parameter. Almost always, I can add the parameter.

Why this is so easy to get wrong

The reason smart people keep building 40-tool MCP servers is that every single tool, in isolation, seems justified. "I need a tool for this specific lookup because the parameters are different." "I need a separate tool here because the response shape is different." "I need this one because it's read-only and that one writes." Each decision is local. Each decision is defensible. You end up with a surface that nobody would have designed on purpose if they'd drawn it up on a whiteboard first.

The thing that breaks you isn't any single tool. It's the shape of the whole menu.

If I'd started with the constraint "three tools, figure out how to fit everything into them," I would've made very different choices about parameters from day one. That's the mental model I work from now when I start a new MCP. Pick the verbs first. Jam everything through them. Only split when splitting is genuinely unavoidable (and be suspicious of yourself when you decide it is).

Connecting your MCP to web agents

The biggest technical challenge wasn't the tool design. It was OAuth.

I write everything in PHPNomad these days. I work both in and out of WordPress, so from the start I wanted my MCP systems to live on a stack I could use in either context. PHPNomad is the framework I built to make that possible, and Beacon was always going to be built on it. Nomadic means the same core logic runs as a microservice, a WordPress plugin, or anything else PHP can host, without rewriting the thing.

That's the good part. The rough part was that PHPNomad didn't have OAuth support when I started Beacon, so I had to figure out and build my own flow. That took real effort. It was also unavoidable, because Claude and the other flagship agents require an OAuth flow to connect. No OAuth, no production MCP, no sales conversations where a prospect pastes a URL into Claude and starts talking to Siren.

Honestly, if you're not already comfortable building OAuth flows, consider a different stack. OAuth is not a weekend project. It's a security boundary, and a wrong implementation can bite you in ways that don't show up until you have real traffic. I built my own because I needed it for PHPNomad to move forward, and because I've built a few OAuth systems over my career at places like GoDaddy. If you don't have that kind of experience sitting on your tool belt, pick a stack where OAuth already works and let someone else carry that weight for you.

PHPNomad has OAuth now, which means the next MCP I build on top of it is dramatically cheaper to ship. Pay the cost once in infrastructure, get it for free forever after.

Save yourself the heartache: don't test with voice mode

One last thing, because I wish someone had told me. Voice mode in most AI agents on the web doesn't actually use your MCP tools, even though the UI says it can see them, and even though the agent itself thinks it can see them. It can't. Voice interfaces only work with a subset of the MCP surface as of today (looking at you, every major vendor).

So after you get your MCP configured, resist the urge to test it by talking to it. Use a normal text interface first. Confirm your tools actually get called. Confirm the agent is reading parameters correctly. Then flip on voice mode and see how much of your surface survives.

Ask me how I know.

The lesson, compressed

If you're about to ship an MCP server, or you've shipped one and it's not behaving, the question is not "what system prompt do I need?" It's "how many tools am I exposing, and what happens if I cut that number by 90%?"

Beacon went from 30-40 tools that didn't work to 3-4 tools I use every day. The product didn't change. The API didn't change. The surface I exposed to the model changed, and that was the whole thing.

If you're building on Siren and want to see the small-surface pattern in action, Beacon is live. Connect it, poke at it, read the tool descriptions. It's not a big file. That's the point.