StackMCP
Blog
·8 min read

Too Many MCP Servers? How to Build a Lean, Efficient Stack

Your MCP setup is eating your context window. Here's how to audit your servers, cut token bloat, and build a lean stack that actually makes your AI assistant smarter.

mcptoken-optimizationbest-practicesstacks

A developer on Reddit shared their Claude Code setup: 4 MCP servers, 167 tools, roughly 60,000 tokens consumed before they typed a single prompt. Their $2 coding session turned into a $47 nightmare. And the worst part? After a certain threshold, adding more tools does not make your AI assistant smarter. It makes it worse. Tool selection accuracy drops as the model has to sift through hundreds of tool definitions to find the right one for each step.

If your MCP setup feels bloated, slow, or expensive, you are not imagining it. The problem is structural, and the fix is straightforward.

The Hidden Cost of Every MCP Server

Here is what happens when your AI client starts a session with MCP servers enabled: every single tool from every server gets preloaded into the context window. Not just the tool names. The full definitions -- names, descriptions, complete JSON schemas, every parameter with its type, constraints, and documentation. All of it, whether you use those tools or not.

The numbers are stark:

  • 4 MCP servers with 167 tools consumed ~60K tokens in a documented Claude Code session. That is before any code, any conversation, any actual work.
  • Some power-user setups hit 150K+ tokens from tool definitions alone. On a 200K context window, that leaves almost nothing for the actual task.
  • A reasonable estimate is ~500 tokens per tool. A server with 25 tools costs you roughly 12,500 tokens just by existing in your config.

Clients are starting to impose hard limits. Cursor caps at 40 MCP tools. GitHub Copilot caps at 128. These limits exist because the providers know that unbounded tool counts degrade performance.

Claude Code introduced Tool Search in January 2026, which cut tool definition bloat by 46.9% -- from 51K tokens down to 8.5K in one benchmark. That is a significant improvement, but it is a protocol-level optimization, not a fix for a fundamentally overloaded setup. If you are running 10 servers with 200+ tools, no amount of clever caching will save you from slow, confused, and expensive sessions.

Why More Tools Makes Your Agent Worse

This is the part that surprises people. Intuitively, you would expect that giving an AI assistant access to more tools would make it more capable. The opposite is true past a certain point.

When a model sees 200 tool definitions in its context, it has to evaluate all of them for every action it takes. The probability of selecting the wrong tool increases. The model spends more reasoning tokens deciding between overlapping tools. And because context space is finite, those tool definitions crowd out the actual information the model needs -- your code, your conversation history, the output from previous steps.

Think of it like a workbench. A well-organized bench with the right tools for the job makes you faster. A bench buried under every tool from every aisle of the hardware store makes you slower, even though technically you have more capabilities.

The Audit Framework: Clean Up Your Stack in 15 Minutes

If you suspect your MCP setup has grown beyond what is useful, here is a practical process to trim it down.

Step 1: Count Your Tools

Check how many tools each of your MCP servers exposes. Most clients show this in their MCP configuration panel or logs. If not, check the server's documentation or repository.

Add them up. If you are over 50 tools total, you are almost certainly carrying dead weight. Over 100, and you are actively hurting performance.

Step 2: Calculate Your Token Budget

Multiply your total tool count by 500. That is your approximate token overhead from MCP tool definitions alone.

Now compare that to your model's context window. Claude offers 200K tokens. If your MCP overhead is 40K, that is 20% of your context consumed before you start working. For models with smaller context windows, the ratio gets worse fast.

A good target: keep your MCP token overhead under 15% of your total context window. For a 200K window, that means roughly 30K tokens, or about 60 tools maximum.

Step 3: Check for Overlap

This is where most of the waste hides. Look at what each server actually does and ask: do I have multiple servers that perform the same function?

Common overlaps:

  • File system access: Multiple servers that read and write files. You probably only need one.
  • Web browsing: Playwright MCP, Browserbase MCP, Puppeteer MCP -- pick the one that matches your workflow.
  • Search: Several servers include some form of web or documentation search. Consolidate to one.
  • Database access: If you have both a generic Postgres MCP and a Supabase MCP, the Supabase server likely covers everything the generic one does, plus more.

Every overlapping tool is wasted tokens and increased confusion for the model.

Step 4: Remove "Just in Case" Servers

Be honest with yourself. If you added a server because it sounded useful but you have not triggered any of its tools in the past week, remove it. You can always add it back when you actually need it.

MCP servers are cheap to configure. Adding one takes 30 seconds. There is no reason to carry the token cost of a server you are not using just because you might need it someday.

Building a Lean Stack: The Right Servers for the Job

The developers who get the most out of MCP are not the ones with the most servers. They are the ones whose servers match their actual workflow with zero overlap.

Here is the approach that works:

Start with 3-4 servers maximum for any given workflow. This is not an arbitrary limit. At 3-4 well-chosen servers, you typically land between 20-50 tools and 10K-25K tokens of overhead. That leaves 85-95% of your context window for actual work.

Match servers to what you do daily, not what you might do quarterly. If you deploy to Vercel every day, the Vercel MCP server earns its token cost. If you deploy once a month, it does not. Add it when you need it, remove it when you are done.

Use pre-optimized stacks instead of assembling from scratch. This is why we built the curated stacks on stackmcp.dev. Each stack is designed for a specific workflow with servers that complement each other without overlapping.

Some examples:

  • Frontend Developer stack: 4 servers, ~23K tokens. Playwright for testing, Context7 for docs, Magic MCP for components, Figma for design specs. That is under 12% of Claude's context window while covering the entire frontend workflow.
  • Indie Hacker stack: 5 servers, ~41K tokens. Supabase, Stripe, GitHub, Vercel, Sentry. Heavier, but each server eliminates a major category of context-switching for a solo SaaS developer.
  • Fullstack Web stack: Covers both frontend and backend workflows with carefully selected servers that avoid the overlap trap.

The key insight: a curated stack of 4 servers will outperform a random collection of 10 servers every time. Fewer tools means faster tool selection, more context space for your code, and lower costs per session.

What About Dynamic Tool Loading?

You might be wondering if the solution is to wait for better tooling. Claude Code's Tool Search, mentioned earlier, is a step in the right direction -- it indexes tools and only loads relevant definitions into context based on the current task. That 46.9% reduction is real and meaningful.

But dynamic loading does not eliminate the need for curation. Even with Tool Search enabled, the model still needs to index and reason about your available tools. More servers mean more indexing overhead, more potential for confusion, and more things that can break. A lean, well-chosen stack benefits from these optimizations more than a bloated one does.

The best approach is both: curate your servers aggressively, and let protocol-level improvements compound the gains.

Start Here

If you are reading this because your MCP setup feels slow, expensive, or unreliable, here is what to do right now:

  1. Audit your current setup using the four-step framework above. Count your tools, calculate the token cost, find overlaps, and cut the dead weight.
  2. Pick a curated stack that matches your primary workflow. Browse the stacks on stackmcp.dev to find one designed for your use case.
  3. Compare servers if you are unsure which ones to keep. The compare page lets you see tool counts, token costs, and capabilities side by side.

Your AI assistant gets smarter when you give it less to think about and more room to work. A lean MCP stack is not a compromise -- it is an upgrade.

Related Stacks

Related Servers