Skip to main content

LiteLLM Integration

LiteLLM is a Python SDK that lets you call 100+ LLM providers with one unified interface. By pointing LiteLLM’s api_base at the LeanMCP AI Gateway, every request — including tool calls, token usage, and cost — gets logged to your observability dashboard with zero changes to your model code. This is useful when you:
  • Run evaluations or benchmarks across multiple models and want a single place to inspect every call
  • Use tool-calling agents and need to see the full request/response cycle per tool invocation
  • Want cost and latency tracking without adding custom instrumentation

Prerequisites

1

Get Credits

Purchase credits at app.leanmcp.com/billing
2

Create API Key

Create an API key at app.leanmcp.com/api-keys with SDK permissions

Gateway Endpoints

ProviderGateway Base URL
OpenAIhttps://aigateway.leanmcp.com/v1/openai
Anthropichttps://aigateway.leanmcp.com/v1/anthropic
xAI (Grok)https://aigateway.leanmcp.com/v1/xai
Fireworkshttps://aigateway.leanmcp.com/v1/fireworks

Basic Usage

Pass api_base and api_key to litellm.completion(). LiteLLM forwards them to the provider — except now the request goes through the gateway first.
import litellm
import os

response = litellm.completion(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "What is the LeanMCP AI Gateway?"}
    ],
    api_base="https://aigateway.leanmcp.com/v1/openai",
    api_key=os.environ["LEANMCP_API_KEY"],
)

print(response.choices[0].message.content)

Using Different Providers

Swap the api_base URL and use the provider-specific model prefix that LiteLLM expects.

OpenAI

import litellm
import os

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello from OpenAI via the gateway."}],
    api_base="https://aigateway.leanmcp.com/v1/openai",
    api_key=os.environ["LEANMCP_API_KEY"],
)

print(response.choices[0].message.content)

Anthropic

import litellm
import os

response = litellm.completion(
    model="anthropic/claude-sonnet-4-5-20250929",
    messages=[{"role": "user", "content": "Hello from Anthropic via the gateway."}],
    api_base="https://aigateway.leanmcp.com/v1/anthropic",
    api_key=os.environ["LEANMCP_API_KEY"],
)

print(response.choices[0].message.content)

Fireworks

LiteLLM requires the fireworks_ai/ prefix for Fireworks models.
import litellm
import os

response = litellm.completion(
    model="fireworks_ai/accounts/fireworks/models/llama-v3p1-8b-instruct",
    messages=[{"role": "user", "content": "Hello from Fireworks via the gateway."}],
    max_tokens=1024,
    temperature=0.0,
    api_base="https://aigateway.leanmcp.com/v1/fireworks",
    api_key=os.environ["LEANMCP_API_KEY"],
)

print(response.choices[0].message.content)

Streaming

Streaming works the same way. Set stream=True and iterate over chunks.
import litellm
import os

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem."}],
    api_base="https://aigateway.leanmcp.com/v1/openai",
    api_key=os.environ["LEANMCP_API_KEY"],
    stream=True,
)

for chunk in response:
    content = chunk.choices[0].delta.content or ""
    print(content, end="", flush=True)

Tool Calling

LiteLLM supports tool/function calling. When routed through the gateway, every tool call and its response is captured in the observability dashboard.
import litellm
import os
import json

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"],
            },
        },
    }
]

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is the weather in London?"}],
    tools=tools,
    api_base="https://aigateway.leanmcp.com/v1/openai",
    api_key=os.environ["LEANMCP_API_KEY"],
)

tool_calls = response.choices[0].message.tool_calls
if tool_calls:
    for tc in tool_calls:
        print(f"Function: {tc.function.name}")
        print(f"Args: {tc.function.arguments}")
Every tool call shows up in app.leanmcp.com/observability with the full function name, arguments, and the model’s response.

Using with Existing Frameworks

LiteLLM is often used as the LLM backend for evaluation frameworks, agent harnesses, and batch pipelines. You can route all of those calls through the gateway by passing api_base and api_key as extra kwargs.

Example: Evaluation Framework

This pattern comes from a real benchmark runner that uses LiteLLM under the hood. The gateway endpoint and key are passed as JSON kwargs to the framework’s CLI:
# Route all LLM calls through the gateway
LLM_ARGS='{"api_base": "https://aigateway.leanmcp.com/v1/fireworks", "api_key": "'$LEANMCP_API_KEY'"}'

python run_eval.py \
  --model "fireworks_ai/accounts/fireworks/models/llama-v3p1-8b-instruct" \
  --llm-args "$LLM_ARGS" \
  --num-tasks 5
Any framework that forwards kwargs to litellm.completion() will pick up the gateway routing automatically.

Environment Setup

# .env
LEANMCP_API_KEY=leanmcp_your_api_key_here
# Load the key from .env
from dotenv import load_dotenv
load_dotenv()

import os
api_key = os.environ["LEANMCP_API_KEY"]

Debugging

Turn on LiteLLM verbose logging to see the exact URL, headers, and body of each outgoing request:
import litellm
litellm._turn_on_debug()
Or set the environment variable:
export LITELLM_LOG=DEBUG
This confirms that requests are hitting aigateway.leanmcp.com and not the provider directly.

Troubleshooting

  • Verify LEANMCP_API_KEY is set and starts with leanmcp_
  • Check that the key has SDK permissions at app.leanmcp.com/api-keys
  • Make sure you have credits in your account
  • Confirm the model string uses the correct LiteLLM prefix (e.g. fireworks_ai/ for Fireworks, anthropic/ for Anthropic)
  • Verify the api_base matches the provider (e.g. /v1/fireworks for Fireworks models, not /v1/openai)
  • Enable debug logging (litellm._turn_on_debug()) and confirm the request URL starts with https://aigateway.leanmcp.com
  • Check app.leanmcp.com/observability — requests appear within a few seconds
  • Make sure you pass stream=True to litellm.completion()
  • The gateway supports streaming for all providers. If you get buffered responses, check your HTTP client settings

Next Steps

Observability Dashboard

Inspect every request, response, and token count

SDK Integration

OpenAI and Anthropic SDK examples

Security

Block sensitive data before it reaches providers

Token Optimization

A/B testing and cost reduction