How do I route Claude Code through our LiteLLM proxy for cost tracking and audit?

Question

Kalle Lamminpää · Accepted Answer

A gateway between Claude Code and the Anthropic API gives you one place to track spend per team, rotate credentials, enforce budgets, and produce an audit log. Every Claude Code request gets the X-Claude-Code-Session-Id header attached, which a gateway can use to aggregate all calls from a single session without parsing request bodies.

Step 1: Verify your gateway format

Your gateway must expose at least one of these API formats without stripping required headers or body fields:

If a gateway strips the anthropic-beta header, prompt caching, extended thinking, and other beta features stop working silently.

Step 2: Point Claude Code at the gateway

For a LiteLLM server exposing the Anthropic Messages format (recommended):

That's it. Claude Code sends all requests to that base URL. The model names stay the same.

For provider-specific pass-through endpoints:

Pass-through endpoints skip load balancing and fallback logic that the unified endpoint provides. Use the unified Anthropic Messages format unless you need direct pass-through for compliance reasons.

Step 3: Configure authentication

Static key (simplest):

This sends the value as an Authorization: Bearer header. Use it for shared team access where key rotation is infrequent.

Dynamic key helper (for rotating credentials or per-user JWT):

Set the TTL to match your token expiry:

The helper runs before each session. apiKeyHelper has lower precedence than ANTHROPICAUTHTOKEN -- unset the static token if you want the helper to take effect.

Step 4: Enable model discovery (optional)

When your gateway exposes /v1/models, Claude Code can query it at startup and add the available models to the /model picker:

Requires Claude Code v2.1.129 or later. Discovery only runs for the Anthropic Messages format and only when ANTHROPICBASEURL is set to something other than api.anthropic.com. Results are cached to ~/.claude/cache/gateway-models.json.

Step 5: Fix prompt cache key collisions (optional)

Claude Code prepends a short attribution block to each system prompt containing a client fingerprint. The Anthropic API strips this before processing, so it does not affect Anthropic's prompt cache. If your gateway implements its own cache keyed on the full request body, this block breaks cache hits across sessions. Disable it:

Footguns

LiteLLM PyPI 1.82.7 and 1.82.8 were compromised with credential-stealing malware. If you installed either version, rotate all credentials that were on those machines and follow the remediation steps in BerriAI/litellm#24518. Check your current version with pip show litellm. The supply chain compromise is the reason Anthropic docs include an unusually explicit warning about a third-party package.

ANTHROPICAUTHTOKEN takes precedence over apiKeyHelper. If you set both, the static token always wins. Developers who add apiKeyHelper without unsetting the static token will see the static token used, silently bypassing the key rotation workflow.

Using Bedrock or Vertex format via the gateway disables server-managed settings. Server-managed org policy requires a direct connection to api.anthropic.com. When Claude Code is pointed at a gateway with CLAUDECODEUSEBEDROCK=1 or CLAUDECODEUSEVERTEX=1, managed settings are not delivered. If your org relies on server-managed permission policy, use the Anthropic Messages format via ANTHROPICBASEURL instead.

Gateway model discovery surfaces every model the shared API key can reach. CLAUDECODEENABLEGATEWAYMODELDISCOVERY=1 shows all models returned by /v1/models. For a gateway backed by a shared key, this can expose model names your org did not intend to make visible to individual developers. Leave discovery off unless you control which models the gateway returns.

Missing beta header passthrough causes silent feature degradation. A gateway that strips anthropic-beta will not break basic completion but will silently disable features like prompt caching. Test with claude -p "echo hello" --output-format json and check totalcost_usd. If caching is working, repeated identical prompts should show lower cost on subsequent calls.

Session attribution for proxy-side aggregation

Claude Code attaches X-Claude-Code-Session-Id to every request. Your gateway can use this to aggregate all API calls from a single conversation without parsing request bodies. For parallel subagents, X-Claude-Code-Agent-Id identifies the individual agent that made the request, and X-Claude-Code-Parent-Agent-Id identifies who spawned it. Both are ephemeral per-spawn identifiers, not persistent user IDs.

When NOT to route through a gateway

You use server-managed org policy. Gateways using Bedrock/Vertex/Foundry format bypass server-managed settings delivery.
Your gateway does not forward anthropic-beta. You will lose prompt caching, extended thinking, and any feature gated behind beta headers.
You are on Claude Max personal plan. Gateways are an enterprise pattern; the per-session cost structure of Max does not benefit from gateway-side cost tracking.

Format	Path	Required passthrough
Anthropic Messages	`/v1/messages`, `/v1/messages/count_tokens`	`anthropic-beta`, `anthropic-version` headers
Bedrock InvokeModel	`/invoke`, `/invoke-with-response-stream`	`anthropic_beta`, `anthropic_version` body fields
Vertex rawPredict	`:rawPredict`, `:streamRawPredict`	`anthropic-beta`, `anthropic-version` headers

How do I route Claude Code through our LiteLLM proxy for cost tracking and audit?

Step 1: Verify your gateway format

Step 2: Point Claude Code at the gateway

Step 3: Configure authentication

Step 4: Enable model discovery (optional)

Step 5: Fix prompt cache key collisions (optional)

Footguns

Session attribution for proxy-side aggregation

When NOT to route through a gateway

Sources

Read more