LiteLLM Integration

LiteLLM is a Python SDK that lets you call 100+ LLM providers with one unified interface. By pointing LiteLLM’s api_base at the LeanMCP AI Gateway, every request — including tool calls, token usage, and cost — gets logged to your observability dashboard with zero changes to your model code. This is useful when you:

Run evaluations or benchmarks across multiple models and want a single place to inspect every call
Use tool-calling agents and need to see the full request/response cycle per tool invocation
Want cost and latency tracking without adding custom instrumentation

Prerequisites

Get Credits

Purchase credits at app.leanmcp.com/billing

Create API Key

Create an API key at app.leanmcp.com/api-keys with SDK permissions

Gateway Endpoints

Provider	Gateway Base URL
OpenAI	`https://aigateway.leanmcp.com/v1/openai`
Anthropic	`https://aigateway.leanmcp.com/v1/anthropic`
xAI (Grok)	`https://aigateway.leanmcp.com/v1/xai`
Fireworks	`https://aigateway.leanmcp.com/v1/fireworks`

Basic Usage

Pass api_base and api_key to litellm.completion(). LiteLLM forwards them to the provider — except now the request goes through the gateway first.

Python
cURL

import litellm
import os

response = litellm.completion(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "What is the LeanMCP AI Gateway?"}
    ],
    api_base="https://aigateway.leanmcp.com/v1/openai",
    api_key=os.environ["LEANMCP_API_KEY"],
)

print(response.choices[0].message.content)

curl -X POST https://aigateway.leanmcp.com/v1/openai/chat/completions \
  -H "Authorization: Bearer $LEANMCP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "What is the LeanMCP AI Gateway?"}]
  }'

Using Different Providers

Swap the api_base URL and use the provider-specific model prefix that LiteLLM expects.

OpenAI

Python
cURL

import litellm
import os

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello from OpenAI via the gateway."}],
    api_base="https://aigateway.leanmcp.com/v1/openai",
    api_key=os.environ["LEANMCP_API_KEY"],
)

print(response.choices[0].message.content)

curl -X POST https://aigateway.leanmcp.com/v1/openai/chat/completions \
  -H "Authorization: Bearer $LEANMCP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello from OpenAI via the gateway."}]
  }'

Anthropic

Python
cURL

import litellm
import os

response = litellm.completion(
    model="anthropic/claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Hello from Anthropic via the gateway."}],
    api_base="https://aigateway.leanmcp.com/v1/anthropic",
    api_key=os.environ["LEANMCP_API_KEY"],
)

print(response.choices[0].message.content)

curl -X POST https://aigateway.leanmcp.com/v1/anthropic/v1/messages \
  -H "Authorization: Bearer $LEANMCP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello from Anthropic via the gateway."}]
  }'

Fireworks

LiteLLM requires the fireworks_ai/ prefix for Fireworks models.

Python
cURL

import litellm
import os

response = litellm.completion(
    model="fireworks_ai/accounts/fireworks/models/llama-v3p1-8b-instruct",
    messages=[{"role": "user", "content": "Hello from Fireworks via the gateway."}],
    max_tokens=1024,
    temperature=0.0,
    api_base="https://aigateway.leanmcp.com/v1/fireworks",
    api_key=os.environ["LEANMCP_API_KEY"],
)

print(response.choices[0].message.content)

curl -X POST https://aigateway.leanmcp.com/v1/fireworks/chat/completions \
  -H "Authorization: Bearer $LEANMCP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "accounts/fireworks/models/llama-v3p1-8b-instruct",
    "messages": [{"role": "user", "content": "Hello from Fireworks via the gateway."}],
    "max_tokens": 1024,
    "temperature": 0.0
  }'

Streaming

Streaming works the same way. Set stream=True and iterate over chunks.

Python
cURL

import litellm
import os

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem."}],
    api_base="https://aigateway.leanmcp.com/v1/openai",
    api_key=os.environ["LEANMCP_API_KEY"],
    stream=True,
)

for chunk in response:
    content = chunk.choices[0].delta.content or ""
    print(content, end="", flush=True)

curl -N -X POST https://aigateway.leanmcp.com/v1/openai/chat/completions \
  -H "Authorization: Bearer $LEANMCP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Write a short poem."}],
    "stream": true
  }'

Tool Calling

LiteLLM supports tool/function calling. When routed through the gateway, every tool call and its response is captured in the observability dashboard.

import litellm
import os
import json

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"],
            },
        },
    }
]

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is the weather in London?"}],
    tools=tools,
    api_base="https://aigateway.leanmcp.com/v1/openai",
    api_key=os.environ["LEANMCP_API_KEY"],
)

tool_calls = response.choices[0].message.tool_calls
if tool_calls:
    for tc in tool_calls:
        print(f"Function: {tc.function.name}")
        print(f"Args: {tc.function.arguments}")

Every tool call shows up in app.leanmcp.com/observability with the full function name, arguments, and the model’s response.

Using with Existing Frameworks

LiteLLM is often used as the LLM backend for evaluation frameworks, agent harnesses, and batch pipelines. You can route all of those calls through the gateway by passing api_base and api_key as extra kwargs.

Example: Evaluation Framework

This pattern comes from a real benchmark runner that uses LiteLLM under the hood. The gateway endpoint and key are passed as JSON kwargs to the framework’s CLI:

# Route all LLM calls through the gateway
LLM_ARGS='{"api_base": "https://aigateway.leanmcp.com/v1/fireworks", "api_key": "'$LEANMCP_API_KEY'"}'

python run_eval.py \
  --model "fireworks_ai/accounts/fireworks/models/llama-v3p1-8b-instruct" \
  --llm-args "$LLM_ARGS" \
  --num-tasks 5

Any framework that forwards kwargs to litellm.completion() will pick up the gateway routing automatically.

Environment Setup

# .env
LEANMCP_API_KEY=leanmcp_your_api_key_here

# Load the key from .env
from dotenv import load_dotenv
load_dotenv()

import os
api_key = os.environ["LEANMCP_API_KEY"]

Debugging

Turn on LiteLLM verbose logging to see the exact URL, headers, and body of each outgoing request:

import litellm
litellm._turn_on_debug()

Or set the environment variable:

export LITELLM_LOG=DEBUG

This confirms that requests are hitting aigateway.leanmcp.com and not the provider directly.

Troubleshooting

litellm.completion returns an auth error

Verify LEANMCP_API_KEY is set and starts with leanmcp_
Check that the key has SDK permissions at app.leanmcp.com/api-keys
Make sure you have credits in your account

Model not found or routing error

Confirm the model string uses the correct LiteLLM prefix (e.g. fireworks_ai/ for Fireworks, anthropic/ for Anthropic)
Verify the api_base matches the provider (e.g. /v1/fireworks for Fireworks models, not /v1/openai)

Requests not showing up in the dashboard

Enable debug logging (litellm._turn_on_debug()) and confirm the request URL starts with https://aigateway.leanmcp.com
Check app.leanmcp.com/observability — requests appear within a few seconds

Streaming not working

Make sure you pass stream=True to litellm.completion()
The gateway supports streaming for all providers. If you get buffered responses, check your HTTP client settings

Next Steps

Observability Dashboard

Inspect every request, response, and token count

SDK Integration

OpenAI and Anthropic SDK examples

Security

Block sensitive data before it reaches providers

Token Optimization

A/B testing and cost reduction

Overview

For Users

For Developers

Features

LiteLLM

LiteLLM Integration

Prerequisites

Gateway Endpoints

Basic Usage

Using Different Providers

OpenAI

Anthropic

Fireworks

Streaming

Tool Calling

Using with Existing Frameworks

Example: Evaluation Framework

Environment Setup

Debugging

Troubleshooting

Next Steps

Observability Dashboard

SDK Integration

Security

Token Optimization

​LiteLLM Integration

​Prerequisites

​Gateway Endpoints

​Basic Usage

​Using Different Providers

​OpenAI

​Anthropic

​Fireworks

​Streaming

​Tool Calling

​Using with Existing Frameworks

​Example: Evaluation Framework

​Environment Setup

​Debugging

​Troubleshooting

​Next Steps

Observability Dashboard

SDK Integration

Security

Token Optimization

LiteLLM Integration

Prerequisites

Gateway Endpoints

Basic Usage

Using Different Providers

OpenAI

Anthropic

Fireworks

Streaming

Tool Calling

Using with Existing Frameworks

Example: Evaluation Framework

Environment Setup

Debugging

Troubleshooting

Next Steps