> ## Documentation Index
> Fetch the complete documentation index at: https://docs.leanmcp.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Reducing Tokens in MCPs

> Best practices for minimizing token usage in your MCP servers

## How MCP Tool Definitions Work

When you add a tool to your MCP, here's what happens:

```mermaid theme={null}
flowchart LR
    T[Your Tools] -->|Descriptions + Schemas| LLM[LLM]
    LLM -->|Decision| TC{Call tool?}
    TC -->|Yes| E[Execute]
    TC -->|No| R[Respond]
```

Every tool you define sends its **description** and **input schema** to the LLM as part of the prompt. The LLM reads these definitions to decide whether to call a tool and which one.

**The problem:** Every tool you add increases your input token count. This has two costs:

1. **Context space** — LLMs have finite context windows. More tokens in tool definitions = fewer tokens for conversation history and responses.
2. **Money** — Input tokens cost money. Every request pays for all your tool definitions, whether they're used or not.

<Warning>
  An MCP with 50 tools and verbose descriptions can easily consume **2,000-5,000 tokens per request** — before the user even says anything.
</Warning>

This guide covers practical strategies to minimize token usage and keep your MCPs lean.

***

## The Problem: APIs vs MCPs

| REST APIs                              | MCPs                              |
| -------------------------------------- | --------------------------------- |
| Consumed by **developers**             | Consumed by **AI agents**         |
| Need detailed documentation            | Need minimal hints                |
| Stateless, hundreds of endpoints       | Focused, purpose-built tools      |
| Verbose descriptions prevent confusion | Verbose descriptions waste tokens |

Most MCPs are auto-generated from OpenAPI specs with paragraphs of explanations designed for humans. **Agents don't need this.** They have knowledge in their weights. A brief description + clear schema is enough.

<Warning>
  Auto-generated MCPs from OpenAPI specs are typically **pathetic for agents**. Don't just wrap your API — optimize it for AI.
</Warning>

***

## Strategy 1: Expose Only What You Need

Your API might have hundreds of endpoints, but your MCP shouldn't.

```typescript theme={null}
// BAD: 50+ tools for every endpoint

// GOOD: Only what agents actually use
@Tool({ description: "Search products" })
async searchProducts(input: { query: string }) { ... }

@Tool({ description: "Add to cart" })  
async addToCart(input: { productId: string }) { ... }
```

***

## Strategy 2: Optimize Output Size

APIs return paginated results with hundreds of items. MCPs should return **minimal useful responses**.

```typescript theme={null}
// BAD: Returns 100 results
async search(query: string) {
  return await api.search(query, { pageSize: 100 });
}

// GOOD: Top 5 relevant results only
async search(query: string) {
  const results = await api.search(query);
  const sorted = rankByRelevance(results, query);
  return sorted.slice(0, 5).map(r => ({
    id: r.id, title: r.title, snippet: r.snippet.slice(0, 200)
  }));
}
```

**Use cursors for large datasets** — never return unbounded queries. The [Exa AI MCP](https://github.com/exa-labs/exa-mcp-server) uses NLP to extract only relevant chunks instead of full HTML pages.

***

## Strategy 3: Design for Minimal Tool Calls

Every tool call is a round trip: AI generates → server responds → AI processes → repeat. Chain of calls = chain of tokens.

**BAD: Flight booking with 6 tool calls**

```mermaid theme={null}
flowchart LR
    A[searchRoutes] --> B[getFlights]
    B --> C[checkAvailability]
    C --> D[getSeatMap]
    D --> E[calculatePrice]
    E --> F[bookFlight]
```

Each step: \~200-500 tokens for request + response. **6 calls = 2,000+ tokens** just for the conversation flow.

**GOOD: One tool that handles the workflow**

```typescript theme={null}
class BookFlightInput {
  @SchemaConstraint({ description: "Departure airport code (e.g., SFO)" })
  from!: string;
  
  @SchemaConstraint({ description: "Arrival airport code (e.g., SIN)" })
  to!: string;
  
  @SchemaConstraint({ description: "Departure date (YYYY-MM-DD)" })
  date!: string;
  
  @Optional()
  @SchemaConstraint({ description: "Preferred class", enum: ["economy", "business", "first"] })
  class?: string;
  
  @Optional()
  @SchemaConstraint({ description: "Max budget in USD" })
  maxPrice?: number;
}

@Tool({ 
  description: "Search and book flights. Returns top options with prices.", 
  inputClass: BookFlightInput 
})
async bookFlight(input: BookFlightInput) {
  // Server handles: route lookup → availability → pricing → filtering
  const flights = await this.searchFlights(input);
  
  return {
    flights: flights.slice(0, 5).map(f => ({
      airline: f.airline,
      departure: f.departure,
      arrival: f.arrival,
      price: f.price,
      duration: f.duration
    })),
    message: "Reply with flight number to book"
  };
}
```

**Result:** 1 call instead of 6. \~300 tokens instead of 2,000+.

<Tip>
  **Rule of thumb:** If your workflow requires the AI to call tools in sequence, combine them into one tool. Let your server handle the orchestration.
</Tip>

***

## Strategy 4: Keep Descriptions Concise

```typescript theme={null}
// BAD: Documentation-style
@Tool({ description: "This tool allows you to search for products in our catalog. You can search by name, category, or SKU..." })

// GOOD: Minimal
@Tool({ description: "Search products by name" })
```

***

## Strategy 5: Use Token Caching

MCP tool definitions are sent with every request. Cache them!

| Provider  | Cost Reduction |
| --------- | -------------- |
| Anthropic | \~90% cheaper  |
| OpenAI    | \~50% cheaper  |
| Fireworks | \~90% cheaper  |

Once cached, definitions cost **1/10th to 1/100th** of normal tokens.

***

## Strategy 6: Don't MCP Everything

Use sandbox/code execution for:

* Simple calculations (no calculator MCP!)
* File conversions (PNG to JPG)
* Math operations
* Data transformations

MCPs are for: authentication, custom APIs, schema validation, consistent UI.

***

## Summary

| Action                       | Token Savings |
| ---------------------------- | ------------- |
| Remove unused tools          | Proportional  |
| Limit response size          | \~90%+        |
| Combine multi-step workflows | \~80-90%      |
| Shorten descriptions         | \~50-80%      |
| Enable caching               | \~90%         |
| Use sandbox for math         | 100%          |
