Token Optimization

The AI Gateway provides tools to understand, analyze, and optimize your AI token usage. Reduce costs while maintaining quality through data-driven decisions.

Understanding Your Usage

Token Analytics Dashboard

See exactly where your tokens are going:

By model - Compare costs across GPT-5.2, Claude, etc.
By feature - Which parts of your app use the most tokens
By user - Identify heavy users and usage patterns
Over time - Track trends and spot anomalies

Cost Breakdown

Metric	Description
Input Tokens	Tokens in the prompt you send
Output Tokens	Tokens in the AI response
Total Cost	Combined cost (output tokens typically cost more)
Requests	Number of API calls
Avg Tokens/Request	Efficiency metric

A/B Testing

Test different approaches to find the most cost-effective solution:

What to Test

Models

GPT-5.2 vs GPT-5.2 vs Claude

Quality vs cost tradeoffs
Task-specific performance

Prompts

Different system prompts

Shorter vs detailed instructions
Different tones/styles

Context Length

How much context to include

Minimal vs comprehensive
Impact on quality

Temperature

Model creativity settings

Lower for consistent outputs
Higher for variety

Setting Up an A/B Test

// Create an experiment
const experiment = await leanmcp.gateway.createExperiment({
  name: 'Model Comparison Q1 2024',
  variants: [
    { name: 'gpt-5.2', weight: 50, config: { model: 'gpt-5.2' } },
    { name: 'claude-3', weight: 50, config: { model: 'claude-sonnet-4-5-20250929' } },
  ],
  metrics: ['quality_rating', 'tokens_used', 'latency', 'cost'],
  duration: '14d',
});

Using Experiments in Code

// Get variant for user
const variant = await leanmcp.gateway.getVariant({
  experimentId: experiment.id,
  userId: userId,
});

// Use the assigned model
const response = await client.chat.completions.create({
  model: variant.config.model,
  messages: messages,
}, {
  headers: {
    'X-Experiment-ID': experiment.id,
    'X-Variant': variant.name,
  }
});

Tracking Outcomes

// Record quality metric (e.g., from user feedback)
await leanmcp.gateway.recordOutcome({
  experimentId: experiment.id,
  userId: userId,
  outcome: {
    quality_rating: 4.5, // 1-5 scale
    user_satisfied: true,
  }
});

Analyzing Results

The dashboard shows:

Statistical significance - Is the difference real?
Cost comparison - Savings per variant
Quality metrics - User satisfaction scores
Recommendation - Which variant to choose

Competitor Insights

Learn from how others optimize:

Benchmarking

Compare your usage to industry averages:

Tokens per request - Are your prompts too long?
Model distribution - Are you using expensive models unnecessarily?
Error rates - Are you making inefficient retries?

Learning from Patterns

Aggregated, anonymized insights from the platform help you understand best practices without exposing anyone’s specific implementation.

Common optimizations we’ve identified:

60% of GPT-5.2 usage can use GPT-5.2 with minimal quality loss
Shorter system prompts often perform equally well
Caching common queries reduces costs by 30-40%

Optimization Strategies

1. Right-Size Your Models

Not every request needs GPT-5.2:

// Route based on complexity
const model = estimateComplexity(message) > 0.7 
  ? 'gpt-5.2' 
  : 'gpt-5.2';

const response = await client.chat.completions.create({
  model: model,
  messages: messages,
});

2. Optimize Prompts

Remove unnecessary context

Only include information the model actually needs. More context = more tokens.

Use concise system prompts

“You are a helpful coding assistant” works as well as a 500-word description for most tasks.

Limit response length

Use max_tokens to prevent unnecessarily long responses.

Ask for specific formats

“Reply in JSON format” or “Answer in one sentence” reduces output tokens.

3. Implement Caching

Cache identical or similar requests:

// Enable response caching
const response = await client.chat.completions.create({
  model: 'gpt-5.2',
  messages: messages,
}, {
  headers: {
    'X-Enable-Cache': 'true',
    'X-Cache-TTL': '3600', // 1 hour
  }
});

4. Batch Similar Requests

Combine multiple small requests:

// Instead of 10 separate calls
// Batch into one request with multiple items
const response = await client.chat.completions.create({
  model: 'gpt-5.2',
  messages: [{
    role: 'user',
    content: `Analyze these 10 items:\n${items.join('\n')}`
  }],
});

Cost Alerts and Limits

Budget Controls

// Set spending limits
await leanmcp.gateway.setBudget({
  daily: 50.00,
  weekly: 200.00,
  monthly: 500.00,
  action: 'alert', // or 'block' to hard stop
});

Alert Configuration

await leanmcp.gateway.createCostAlert({
  threshold: 100.00, // dollars
  period: 'daily',
  channels: ['email', 'slack'],
});

Reporting

Usage Reports

Generate detailed reports:

const report = await leanmcp.gateway.generateReport({
  type: 'usage',
  period: 'monthly',
  groupBy: ['model', 'feature', 'user'],
  format: 'pdf',
});

Export for Analysis

# Export token usage data
curl -X GET "https://api.leanmcp.com/gateway/usage?period=monthly" \
  -H "Authorization: Bearer your-api-key" \
  -o usage-report.json

ROI Calculator

Understand the value of optimization:

Scenario	Current Cost	Optimized Cost	Savings
Model right-sizing	$1,000/mo	$600/mo	40%
Prompt optimization	$600/mo	$450/mo	25%
Caching	$450/mo	$350/mo	22%
Total	$1,000/mo	$350/mo	65%

Best Practices

Start with measurement

You can’t optimize what you don’t measure. Set up tracking before making changes.

Test one thing at a time

Run isolated A/B tests to understand the impact of each change.

Balance cost and quality

The cheapest option isn’t always the best. Track quality metrics alongside cost.

Review regularly

Usage patterns change. Schedule monthly reviews of your optimization strategies.

Set up alerts early

Budget alerts prevent surprise bills and catch issues quickly.

Next Steps

Getting Started

Set up the AI Gateway

Full Integration Guide

Complete code examples

Overview

For Users

For Developers

Features

Token Optimization

Token Optimization

Understanding Your Usage

Token Analytics Dashboard

Cost Breakdown

A/B Testing

What to Test

Models

Prompts

Context Length

Temperature

Setting Up an A/B Test

Using Experiments in Code

Tracking Outcomes

Analyzing Results

Competitor Insights

Benchmarking

Learning from Patterns

Optimization Strategies

1. Right-Size Your Models

2. Optimize Prompts

3. Implement Caching

4. Batch Similar Requests

Cost Alerts and Limits

Budget Controls

Alert Configuration

Reporting

Usage Reports

Export for Analysis

ROI Calculator

Best Practices

Next Steps

Getting Started

Full Integration Guide

Overview

For Users

For Developers

Features

​Token Optimization

​Understanding Your Usage

​Token Analytics Dashboard

​Cost Breakdown

​A/B Testing

​What to Test

Models

Prompts

Context Length

Temperature

​Setting Up an A/B Test

​Using Experiments in Code

​Tracking Outcomes

​Analyzing Results

​Competitor Insights

​Benchmarking

​Learning from Patterns

​Optimization Strategies

​1. Right-Size Your Models

​2. Optimize Prompts

​3. Implement Caching

​4. Batch Similar Requests

​Cost Alerts and Limits

​Budget Controls

​Alert Configuration

​Reporting

​Usage Reports

​Export for Analysis

​ROI Calculator

​Best Practices

​Next Steps

Getting Started

Full Integration Guide

Token Optimization

Understanding Your Usage

Token Analytics Dashboard

Cost Breakdown

A/B Testing

What to Test

Setting Up an A/B Test

Using Experiments in Code

Tracking Outcomes

Analyzing Results

Competitor Insights

Benchmarking

Learning from Patterns

Optimization Strategies

1. Right-Size Your Models

2. Optimize Prompts

3. Implement Caching

4. Batch Similar Requests

Cost Alerts and Limits

Budget Controls

Alert Configuration

Reporting

Usage Reports

Export for Analysis

ROI Calculator

Best Practices

Next Steps