AnswerQA

How do I route Claude Code through our LiteLLM proxy for cost tracking and audit?

Answer

Set ANTHROPIC_BASE_URL to your LiteLLM endpoint, choose between unified Anthropic format and pass-through provider endpoints, configure authentication with ANTHROPIC_AUTH_TOKEN or apiKeyHelper, and avoid the compromised LiteLLM PyPI versions 1.82.7 and 1.82.8.

By Kalle Lamminpää Verified May 12, 2026

A gateway between Claude Code and the Anthropic API gives you one place to track spend per team, rotate credentials, enforce budgets, and produce an audit log. Every Claude Code request gets the X-Claude-Code-Session-Id header attached, which a gateway can use to aggregate all calls from a single session without parsing request bodies.

Step 1: Verify your gateway format

Your gateway must expose at least one of these API formats without stripping required headers or body fields:

FormatPathRequired passthrough
Anthropic Messages/v1/messages, /v1/messages/count_tokensanthropic-beta, anthropic-version headers
Bedrock InvokeModel/invoke, /invoke-with-response-streamanthropic_beta, anthropic_version body fields
Vertex rawPredict:rawPredict, :streamRawPredictanthropic-beta, anthropic-version headers

If a gateway strips the anthropic-beta header, prompt caching, extended thinking, and other beta features stop working silently.

Step 2: Point Claude Code at the gateway

For a LiteLLM server exposing the Anthropic Messages format (recommended):

export ANTHROPIC_BASE_URL=https://litellm.yourcompany.com

That’s it. Claude Code sends all requests to that base URL. The model names stay the same.

For provider-specific pass-through endpoints:

# Bedrock via LiteLLM
export ANTHROPIC_BEDROCK_BASE_URL=https://litellm.yourcompany.com/bedrock
export CLAUDE_CODE_SKIP_BEDROCK_AUTH=1
export CLAUDE_CODE_USE_BEDROCK=1

# Vertex via LiteLLM
export ANTHROPIC_VERTEX_BASE_URL=https://litellm.yourcompany.com/vertex_ai/v1
export ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project
export CLAUDE_CODE_SKIP_VERTEX_AUTH=1
export CLAUDE_CODE_USE_VERTEX=1
export CLOUD_ML_REGION=us-east5

Pass-through endpoints skip load balancing and fallback logic that the unified endpoint provides. Use the unified Anthropic Messages format unless you need direct pass-through for compliance reasons.

Step 3: Configure authentication

Static key (simplest):

export ANTHROPIC_AUTH_TOKEN=sk-your-litellm-master-key

This sends the value as an Authorization: Bearer header. Use it for shared team access where key rotation is infrequent.

Dynamic key helper (for rotating credentials or per-user JWT):

{
  "apiKeyHelper": "~/bin/get-litellm-key.sh"
}
#!/bin/bash
# ~/bin/get-litellm-key.sh
vault kv get -field=api_key secret/litellm/claude-code

Set the TTL to match your token expiry:

export CLAUDE_CODE_API_KEY_HELPER_TTL_MS=3600000  # 1 hour

The helper runs before each session. apiKeyHelper has lower precedence than ANTHROPIC_AUTH_TOKEN — unset the static token if you want the helper to take effect.

Step 4: Enable model discovery (optional)

When your gateway exposes /v1/models, Claude Code can query it at startup and add the available models to the /model picker:

export CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1

Requires Claude Code v2.1.129 or later. Discovery only runs for the Anthropic Messages format and only when ANTHROPIC_BASE_URL is set to something other than api.anthropic.com. Results are cached to ~/.claude/cache/gateway-models.json.

Step 5: Fix prompt cache key collisions (optional)

Claude Code prepends a short attribution block to each system prompt containing a client fingerprint. The Anthropic API strips this before processing, so it does not affect Anthropic’s prompt cache. If your gateway implements its own cache keyed on the full request body, this block breaks cache hits across sessions. Disable it:

export CLAUDE_CODE_ATTRIBUTION_HEADER=0

Footguns

LiteLLM PyPI 1.82.7 and 1.82.8 were compromised with credential-stealing malware. If you installed either version, rotate all credentials that were on those machines and follow the remediation steps in BerriAI/litellm#24518. Check your current version with pip show litellm. The supply chain compromise is the reason Anthropic docs include an unusually explicit warning about a third-party package.

ANTHROPIC_AUTH_TOKEN takes precedence over apiKeyHelper. If you set both, the static token always wins. Developers who add apiKeyHelper without unsetting the static token will see the static token used, silently bypassing the key rotation workflow.

Using Bedrock or Vertex format via the gateway disables server-managed settings. Server-managed org policy requires a direct connection to api.anthropic.com. When Claude Code is pointed at a gateway with CLAUDE_CODE_USE_BEDROCK=1 or CLAUDE_CODE_USE_VERTEX=1, managed settings are not delivered. If your org relies on server-managed permission policy, use the Anthropic Messages format via ANTHROPIC_BASE_URL instead.

Gateway model discovery surfaces every model the shared API key can reach. CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 shows all models returned by /v1/models. For a gateway backed by a shared key, this can expose model names your org did not intend to make visible to individual developers. Leave discovery off unless you control which models the gateway returns.

Missing beta header passthrough causes silent feature degradation. A gateway that strips anthropic-beta will not break basic completion but will silently disable features like prompt caching. Test with claude -p "echo hello" --output-format json and check total_cost_usd. If caching is working, repeated identical prompts should show lower cost on subsequent calls.

Session attribution for proxy-side aggregation

Claude Code attaches X-Claude-Code-Session-Id to every request. Your gateway can use this to aggregate all API calls from a single conversation without parsing request bodies. For parallel subagents, X-Claude-Code-Agent-Id identifies the individual agent that made the request, and X-Claude-Code-Parent-Agent-Id identifies who spawned it. Both are ephemeral per-spawn identifiers, not persistent user IDs.

When NOT to route through a gateway

  • You use server-managed org policy. Gateways using Bedrock/Vertex/Foundry format bypass server-managed settings delivery.
  • Your gateway does not forward anthropic-beta. You will lose prompt caching, extended thinking, and any feature gated behind beta headers.
  • You are on Claude Max personal plan. Gateways are an enterprise pattern; the per-session cost structure of Max does not benefit from gateway-side cost tracking.

Sources

  • LLM gateway configuration
    Canonical reference for gateway API format requirements, authentication methods, model discovery, attribution header, and LiteLLM supply chain warning.
  • Claude Code with LiteLLM
    Practical walkthrough of unified endpoint configuration and common auth patterns.

Was this helpful?