Reducing Tokens in MCPs

How MCP Tool Definitions Work

When you add a tool to your MCP, here’s what happens: Every tool you define sends its description and input schema to the LLM as part of the prompt. The LLM reads these definitions to decide whether to call a tool and which one. The problem: Every tool you add increases your input token count. This has two costs:

Context space — LLMs have finite context windows. More tokens in tool definitions = fewer tokens for conversation history and responses.
Money — Input tokens cost money. Every request pays for all your tool definitions, whether they’re used or not.

An MCP with 50 tools and verbose descriptions can easily consume 2,000-5,000 tokens per request — before the user even says anything.

This guide covers practical strategies to minimize token usage and keep your MCPs lean.

The Problem: APIs vs MCPs

REST APIs	MCPs
Consumed by developers	Consumed by AI agents
Need detailed documentation	Need minimal hints
Stateless, hundreds of endpoints	Focused, purpose-built tools
Verbose descriptions prevent confusion	Verbose descriptions waste tokens

Most MCPs are auto-generated from OpenAPI specs with paragraphs of explanations designed for humans. Agents don’t need this. They have knowledge in their weights. A brief description + clear schema is enough.

Auto-generated MCPs from OpenAPI specs are typically pathetic for agents. Don’t just wrap your API — optimize it for AI.

Strategy 1: Expose Only What You Need

Your API might have hundreds of endpoints, but your MCP shouldn’t.

// BAD: 50+ tools for every endpoint

// GOOD: Only what agents actually use
@Tool({ description: "Search products" })
async searchProducts(input: { query: string }) { ... }

@Tool({ description: "Add to cart" })  
async addToCart(input: { productId: string }) { ... }

Strategy 2: Optimize Output Size

APIs return paginated results with hundreds of items. MCPs should return minimal useful responses.

// BAD: Returns 100 results
async search(query: string) {
  return await api.search(query, { pageSize: 100 });
}

// GOOD: Top 5 relevant results only
async search(query: string) {
  const results = await api.search(query);
  const sorted = rankByRelevance(results, query);
  return sorted.slice(0, 5).map(r => ({
    id: r.id, title: r.title, snippet: r.snippet.slice(0, 200)
  }));
}

Use cursors for large datasets — never return unbounded queries. The Exa AI MCP uses NLP to extract only relevant chunks instead of full HTML pages.

Strategy 3: Design for Minimal Tool Calls

Every tool call is a round trip: AI generates → server responds → AI processes → repeat. Chain of calls = chain of tokens. BAD: Flight booking with 6 tool calls Each step: ~200-500 tokens for request + response. 6 calls = 2,000+ tokens just for the conversation flow. GOOD: One tool that handles the workflow

class BookFlightInput {
  @SchemaConstraint({ description: "Departure airport code (e.g., SFO)" })
  from!: string;
  
  @SchemaConstraint({ description: "Arrival airport code (e.g., SIN)" })
  to!: string;
  
  @SchemaConstraint({ description: "Departure date (YYYY-MM-DD)" })
  date!: string;
  
  @Optional()
  @SchemaConstraint({ description: "Preferred class", enum: ["economy", "business", "first"] })
  class?: string;
  
  @Optional()
  @SchemaConstraint({ description: "Max budget in USD" })
  maxPrice?: number;
}

@Tool({ 
  description: "Search and book flights. Returns top options with prices.", 
  inputClass: BookFlightInput 
})
async bookFlight(input: BookFlightInput) {
  // Server handles: route lookup → availability → pricing → filtering
  const flights = await this.searchFlights(input);
  
  return {
    flights: flights.slice(0, 5).map(f => ({
      airline: f.airline,
      departure: f.departure,
      arrival: f.arrival,
      price: f.price,
      duration: f.duration
    })),
    message: "Reply with flight number to book"
  };
}

Result: 1 call instead of 6. ~300 tokens instead of 2,000+.

Rule of thumb: If your workflow requires the AI to call tools in sequence, combine them into one tool. Let your server handle the orchestration.

Strategy 4: Keep Descriptions Concise

// BAD: Documentation-style
@Tool({ description: "This tool allows you to search for products in our catalog. You can search by name, category, or SKU..." })

// GOOD: Minimal
@Tool({ description: "Search products by name" })

Strategy 5: Use Token Caching

MCP tool definitions are sent with every request. Cache them!

Provider	Cost Reduction
Anthropic	~90% cheaper
OpenAI	~50% cheaper
Fireworks	~90% cheaper

Once cached, definitions cost 1/10th to 1/100th of normal tokens.

Strategy 6: Don’t MCP Everything

Use sandbox/code execution for:

Simple calculations (no calculator MCP!)
File conversions (PNG to JPG)
Math operations
Data transformations

MCPs are for: authentication, custom APIs, schema validation, consistent UI.

Summary

Action	Token Savings
Remove unused tools	Proportional
Limit response size	~90%+
Combine multi-step workflows	~80-90%
Shorten descriptions	~50-80%
Enable caching	~90%
Use sandbox for math	100%

Welcome

CLI

Deployment

Best Practices

Security

Comparisons

Examples

Reducing Tokens in MCPs

How MCP Tool Definitions Work

The Problem: APIs vs MCPs

Strategy 1: Expose Only What You Need

Strategy 2: Optimize Output Size

Strategy 3: Design for Minimal Tool Calls

Strategy 4: Keep Descriptions Concise

Strategy 5: Use Token Caching

Strategy 6: Don’t MCP Everything

Summary

Welcome

CLI

Deployment

Best Practices

Security

Comparisons

Examples

​How MCP Tool Definitions Work

​The Problem: APIs vs MCPs

​Strategy 1: Expose Only What You Need

​Strategy 2: Optimize Output Size

​Strategy 3: Design for Minimal Tool Calls

​Strategy 4: Keep Descriptions Concise

​Strategy 5: Use Token Caching

​Strategy 6: Don’t MCP Everything

​Summary

How MCP Tool Definitions Work

The Problem: APIs vs MCPs

Strategy 1: Expose Only What You Need

Strategy 2: Optimize Output Size

Strategy 3: Design for Minimal Tool Calls

Strategy 4: Keep Descriptions Concise

Strategy 5: Use Token Caching

Strategy 6: Don’t MCP Everything

Summary