Claude Code only speaks the Anthropic API. If you want to use alternative LLM providers (GLM, Kimi, OpenRouter, etc.) or rotate multiple Claude OAuth tokens across a team, there is no built-in way to do it.
A LiteLLM proxy with a custom request transformer that converts Anthropic API calls into requests compatible with various providers. Claude Code connects to the proxy as if it were talking to Anthropic, and the proxy handles the rest.
-
Configure environment:
cp .env.example .env cp config.yaml.example config.yaml
Edit
.envand fill in your API keys:ZAI_API_KEY=... KIMI_API_KEY=... OPENROUTER_API_KEY=... CLAUDE_OAUTH_TOKENS=sk-ant-oat-token1,sk-ant-oat-token2 PROXY_STATIC_TOKEN=... LITELLM_MASTER_KEY=...
Generate tokens:
openssl rand -hex 32 -
Customize
config.yamlfor your deployment (add/remove models, adjust fallbacks). -
Start the proxy:
docker compose up -d
Add to your Claude Code settings (~/.claude/settings.json or per-project .claude/settings.json):
{
"env": {
"ANTHROPIC_BASE_URL": "http://your-server:4000",
"ANTHROPIC_MODEL": "glm",
"ANTHROPIC_CUSTOM_HEADERS": "x-litellm-api-key: YOUR_PROXY_STATIC_TOKEN",
"API_TIMEOUT_MS": "3000000",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
}
}ANTHROPIC_BASE_URL— point to your proxy instanceANTHROPIC_MODEL— the model alias from yourconfig.yaml(e.g.,glm,kimi,openrouter)ANTHROPIC_CUSTOM_HEADERS— passes the static auth token; the format isx-litellm-api-key: YOUR_PROXY_STATIC_TOKEN(no "Bearer" prefix)API_TIMEOUT_MS— request timeout in milliseconds (useful for slow models)CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC— disables telemetry and non-essential network requests
To skip the Claude Code onboarding screen, add to ~/.claude.json:
{
"hasCompletedOnboarding": true
}The proxy includes request_transformer.py — a custom LiteLLM callback that solves several compatibility problems when routing Anthropic API traffic to different providers:
-
Claude OAuth Token Rotation — Maintains a pool of Claude OAuth tokens (
CLAUDE_OAUTH_TOKENSenv var) and round-robins across them. Tokens that hit rate limits are automatically placed on a 60-second cooldown. When all tokens are cooling down, the one expiring soonest is used. -
OAuth Header Stripping — Claude Code sends OAuth
Authorizationheaders with every request. Non-Anthropic providers (GLM, Kimi) reject these headers. The transformer recursively strips them before forwarding. -
Thinking Block Removal (Requests) — Some providers return "thinking" blocks in responses. Claude Code caches these in conversation history. When the conversation later routes to Anthropic, these cached blocks cause validation errors because Anthropic requires non-empty thinking content. The transformer strips thinking blocks from outgoing messages.
-
Thinking Block Removal (Responses) — OpenRouter returns reasoning/thinking blocks that would pollute the conversation cache. The transformer strips them from responses at the source.
custom_auth.py provides lightweight authentication without a database. Set the PROXY_STATIC_TOKEN env var and all requests must include a matching token via the x-litellm-api-key header. If PROXY_STATIC_TOKEN is not set, all requests are allowed.
This is the "light mode" — no database required. For per-user tokens and usage tracking, you can optionally configure LiteLLM's built-in database-backed auth instead.
If your team uses multiple Claude Code Link sessions, you can pool the OAuth tokens for load balancing:
- Each team member completes the Claude Code OAuth flow (Claude Code Link) to obtain an
sk-ant-oat-...token - Collect the tokens and set them as a comma-separated list in the
CLAUDE_OAUTH_TOKENSenv var - The proxy rotates through tokens automatically, cooling down rate-limited ones
This allows the team to share Claude API capacity across multiple live sessions.
By default, the proxy rotates tokens from the CLAUDE_OAUTH_TOKENS pool for all Claude requests. You can switch to passthrough mode where each client's own session token is forwarded to Claude API as-is.
Global setting (env var):
CLAUDE_TOKEN_MODE=passthrough # or "rotate" (default)Per-request override (header): clients can override the global mode by adding x-claude-token-mode to their custom headers:
{
"env": {
"ANTHROPIC_CUSTOM_HEADERS": "x-litellm-api-key: YOUR_TOKEN, x-claude-token-mode: passthrough"
}
}| Mode | Behavior |
|---|---|
rotate |
Server picks next token from CLAUDE_OAUTH_TOKENS pool (default) |
passthrough |
Client's own Authorization token is forwarded to Claude API |
If mode is rotate but no tokens are configured in CLAUDE_OAUTH_TOKENS, the proxy automatically falls back to passthrough.
The proxy routes requests to different providers based on the model name. Supported models are configured in config.yaml:
| Model | Provider | Description |
|---|---|---|
glm |
ZAI API | GLM-5 |
glm-fast |
ZAI API | GLM-5-AIR (faster) |
kimi |
Kimi API | Kimi-for-coding |
openrouter |
OpenRouter | Qwen 3.6 Plus (free) |
anthropic/* |
Anthropic | All Claude models via OAuth wildcard |
Fallbacks: anthropic/* -> glm, anthropic/claude-haiku-* -> glm-fast
config.yaml— Model routing, fallbacks, and LiteLLM settings. Customized per deployment; seeconfig.yaml.examplefor the reference configuration..env— API keys and tokens. See.env.examplefor all required variables.request_transformer.py— Request/response transformation callback.custom_auth.py— Static token authentication handler.