Skip to main content

How MCP Tool Definitions Work

When you add a tool to your MCP, here’s what happens: Every tool you define sends its description and input schema to the LLM as part of the prompt. The LLM reads these definitions to decide whether to call a tool and which one. The problem: Every tool you add increases your input token count. This has two costs:
  1. Context space — LLMs have finite context windows. More tokens in tool definitions = fewer tokens for conversation history and responses.
  2. Money — Input tokens cost money. Every request pays for all your tool definitions, whether they’re used or not.
An MCP with 50 tools and verbose descriptions can easily consume 2,000-5,000 tokens per request — before the user even says anything.
This guide covers practical strategies to minimize token usage and keep your MCPs lean.

The Problem: APIs vs MCPs

REST APIsMCPs
Consumed by developersConsumed by AI agents
Need detailed documentationNeed minimal hints
Stateless, hundreds of endpointsFocused, purpose-built tools
Verbose descriptions prevent confusionVerbose descriptions waste tokens
Most MCPs are auto-generated from OpenAPI specs with paragraphs of explanations designed for humans. Agents don’t need this. They have knowledge in their weights. A brief description + clear schema is enough.
Auto-generated MCPs from OpenAPI specs are typically pathetic for agents. Don’t just wrap your API — optimize it for AI.

Strategy 1: Expose Only What You Need

Your API might have hundreds of endpoints, but your MCP shouldn’t.
// BAD: 50+ tools for every endpoint

// GOOD: Only what agents actually use
@Tool({ description: "Search products" })
async searchProducts(input: { query: string }) { ... }

@Tool({ description: "Add to cart" })  
async addToCart(input: { productId: string }) { ... }

Strategy 2: Optimize Output Size

APIs return paginated results with hundreds of items. MCPs should return minimal useful responses.
// BAD: Returns 100 results
async search(query: string) {
  return await api.search(query, { pageSize: 100 });
}

// GOOD: Top 5 relevant results only
async search(query: string) {
  const results = await api.search(query);
  const sorted = rankByRelevance(results, query);
  return sorted.slice(0, 5).map(r => ({
    id: r.id, title: r.title, snippet: r.snippet.slice(0, 200)
  }));
}
Use cursors for large datasets — never return unbounded queries. The Exa AI MCP uses NLP to extract only relevant chunks instead of full HTML pages.

Strategy 3: Design for Minimal Tool Calls

Every tool call is a round trip: AI generates → server responds → AI processes → repeat. Chain of calls = chain of tokens. BAD: Flight booking with 6 tool calls Each step: ~200-500 tokens for request + response. 6 calls = 2,000+ tokens just for the conversation flow. GOOD: One tool that handles the workflow
class BookFlightInput {
  @SchemaConstraint({ description: "Departure airport code (e.g., SFO)" })
  from!: string;
  
  @SchemaConstraint({ description: "Arrival airport code (e.g., SIN)" })
  to!: string;
  
  @SchemaConstraint({ description: "Departure date (YYYY-MM-DD)" })
  date!: string;
  
  @Optional()
  @SchemaConstraint({ description: "Preferred class", enum: ["economy", "business", "first"] })
  class?: string;
  
  @Optional()
  @SchemaConstraint({ description: "Max budget in USD" })
  maxPrice?: number;
}

@Tool({ 
  description: "Search and book flights. Returns top options with prices.", 
  inputClass: BookFlightInput 
})
async bookFlight(input: BookFlightInput) {
  // Server handles: route lookup → availability → pricing → filtering
  const flights = await this.searchFlights(input);
  
  return {
    flights: flights.slice(0, 5).map(f => ({
      airline: f.airline,
      departure: f.departure,
      arrival: f.arrival,
      price: f.price,
      duration: f.duration
    })),
    message: "Reply with flight number to book"
  };
}
Result: 1 call instead of 6. ~300 tokens instead of 2,000+.
Rule of thumb: If your workflow requires the AI to call tools in sequence, combine them into one tool. Let your server handle the orchestration.

Strategy 4: Keep Descriptions Concise

// BAD: Documentation-style
@Tool({ description: "This tool allows you to search for products in our catalog. You can search by name, category, or SKU..." })

// GOOD: Minimal
@Tool({ description: "Search products by name" })

Strategy 5: Use Token Caching

MCP tool definitions are sent with every request. Cache them!
ProviderCost Reduction
Anthropic~90% cheaper
OpenAI~50% cheaper
Fireworks~90% cheaper
Once cached, definitions cost 1/10th to 1/100th of normal tokens.

Strategy 6: Don’t MCP Everything

Use sandbox/code execution for:
  • Simple calculations (no calculator MCP!)
  • File conversions (PNG to JPG)
  • Math operations
  • Data transformations
MCPs are for: authentication, custom APIs, schema validation, consistent UI.

Summary

ActionToken Savings
Remove unused toolsProportional
Limit response size~90%+
Combine multi-step workflows~80-90%
Shorten descriptions~50-80%
Enable caching~90%
Use sandbox for math100%