When you add a tool to your MCP, here’s what happens:
Every tool you define sends its description and input schema to the LLM as part of the prompt. The LLM reads these definitions to decide whether to call a tool and which one.
The problem: Every tool you add increases your input token count. This has two costs:
- Context space — LLMs have finite context windows. More tokens in tool definitions = fewer tokens for conversation history and responses.
- Money — Input tokens cost money. Every request pays for all your tool definitions, whether they’re used or not.
An MCP with 50 tools and verbose descriptions can easily consume 2,000-5,000 tokens per request — before the user even says anything.
This guide covers practical strategies to minimize token usage and keep your MCPs lean.
The Problem: APIs vs MCPs
| REST APIs | MCPs |
|---|
| Consumed by developers | Consumed by AI agents |
| Need detailed documentation | Need minimal hints |
| Stateless, hundreds of endpoints | Focused, purpose-built tools |
| Verbose descriptions prevent confusion | Verbose descriptions waste tokens |
Most MCPs are auto-generated from OpenAPI specs with paragraphs of explanations designed for humans. Agents don’t need this. They have knowledge in their weights. A brief description + clear schema is enough.
Auto-generated MCPs from OpenAPI specs are typically pathetic for agents. Don’t just wrap your API — optimize it for AI.
Strategy 1: Expose Only What You Need
Your API might have hundreds of endpoints, but your MCP shouldn’t.
// BAD: 50+ tools for every endpoint
// GOOD: Only what agents actually use
@Tool({ description: "Search products" })
async searchProducts(input: { query: string }) { ... }
@Tool({ description: "Add to cart" })
async addToCart(input: { productId: string }) { ... }
Strategy 2: Optimize Output Size
APIs return paginated results with hundreds of items. MCPs should return minimal useful responses.
// BAD: Returns 100 results
async search(query: string) {
return await api.search(query, { pageSize: 100 });
}
// GOOD: Top 5 relevant results only
async search(query: string) {
const results = await api.search(query);
const sorted = rankByRelevance(results, query);
return sorted.slice(0, 5).map(r => ({
id: r.id, title: r.title, snippet: r.snippet.slice(0, 200)
}));
}
Use cursors for large datasets — never return unbounded queries. The Exa AI MCP uses NLP to extract only relevant chunks instead of full HTML pages.
Every tool call is a round trip: AI generates → server responds → AI processes → repeat. Chain of calls = chain of tokens.
BAD: Flight booking with 6 tool calls
Each step: ~200-500 tokens for request + response. 6 calls = 2,000+ tokens just for the conversation flow.
GOOD: One tool that handles the workflow
class BookFlightInput {
@SchemaConstraint({ description: "Departure airport code (e.g., SFO)" })
from!: string;
@SchemaConstraint({ description: "Arrival airport code (e.g., SIN)" })
to!: string;
@SchemaConstraint({ description: "Departure date (YYYY-MM-DD)" })
date!: string;
@Optional()
@SchemaConstraint({ description: "Preferred class", enum: ["economy", "business", "first"] })
class?: string;
@Optional()
@SchemaConstraint({ description: "Max budget in USD" })
maxPrice?: number;
}
@Tool({
description: "Search and book flights. Returns top options with prices.",
inputClass: BookFlightInput
})
async bookFlight(input: BookFlightInput) {
// Server handles: route lookup → availability → pricing → filtering
const flights = await this.searchFlights(input);
return {
flights: flights.slice(0, 5).map(f => ({
airline: f.airline,
departure: f.departure,
arrival: f.arrival,
price: f.price,
duration: f.duration
})),
message: "Reply with flight number to book"
};
}
Result: 1 call instead of 6. ~300 tokens instead of 2,000+.
Rule of thumb: If your workflow requires the AI to call tools in sequence, combine them into one tool. Let your server handle the orchestration.
Strategy 4: Keep Descriptions Concise
// BAD: Documentation-style
@Tool({ description: "This tool allows you to search for products in our catalog. You can search by name, category, or SKU..." })
// GOOD: Minimal
@Tool({ description: "Search products by name" })
Strategy 5: Use Token Caching
MCP tool definitions are sent with every request. Cache them!
| Provider | Cost Reduction |
|---|
| Anthropic | ~90% cheaper |
| OpenAI | ~50% cheaper |
| Fireworks | ~90% cheaper |
Once cached, definitions cost 1/10th to 1/100th of normal tokens.
Strategy 6: Don’t MCP Everything
Use sandbox/code execution for:
- Simple calculations (no calculator MCP!)
- File conversions (PNG to JPG)
- Math operations
- Data transformations
MCPs are for: authentication, custom APIs, schema validation, consistent UI.
Summary
| Action | Token Savings |
|---|
| Remove unused tools | Proportional |
| Limit response size | ~90%+ |
| Combine multi-step workflows | ~80-90% |
| Shorten descriptions | ~50-80% |
| Enable caching | ~90% |
| Use sandbox for math | 100% |