AnswerQA

How do I see what's eating my Claude Code budget?

Answer

Run /usage in any session for the local dollar estimate; the Claude Console Usage page is authoritative. For teams, an admin workspace centralizes tracking; for org-wide attribution, large deployments often add LiteLLM in front. The biggest wins come from model choice, prompt-cache hit rate, and not running an idle agent team.

By Kalle Lamminpää Verified May 7, 2026

/usage shows your current session’s token spend and a local dollar estimate; the Claude Console Usage page is authoritative for billing. Most “why is my bill so high” answers fall out of three numbers: which model you chose, your prompt-cache hit rate, and whether you left an agent team running.

The numbers that matter

The canonical doc puts enterprise spend at roughly $13 per developer per active day and $150-250 per developer per month, with 90% of users below $30 per active day. Translation: if your team’s average is hitting $40+/dev/day, something is off (Opus on routine tasks, sessions that never get /cleared, idle agent teams). If it is below $5/dev/day, half the team is probably not actually using Claude.

See your spend in three places

Live in a session:

/usage

Shows token counts, an estimated dollar figure, and (for Pro/Max subscribers) plan usage bars. The estimate is computed locally from token counts and may differ from your actual bill; treat it as a directional signal, not invoice-ready accounting.

Always-visible status line. Configure your status line to show usage so you do not have to type /usage every five minutes. Especially useful when you are debugging a session that feels expensive.

Authoritative billing: the Usage page in the Claude Console. Cost and usage reporting for admins; this is where you go when finance asks why the line item moved.

Manage spend for a team

Three layers, choose based on org size:

  1. Workspace spend limits (Claude API). Set a hard ceiling on the Claude Code workspace; a runaway session cannot exceed it. The right shape for any team using direct API access.
  2. Centralized admin workspace. Provides cross-developer cost reporting in the Console. The dedicated workspace cannot mint API keys for other use; it exists for Claude Code attribution only.
  3. LiteLLM in front of the API. Several large enterprises route Claude Code traffic through LiteLLM as a proxy that tags every request with a per-team key. Adds operational overhead; gives you per-team attribution that the Console does not yet provide natively.

Where the spend actually goes

CauseWhy it costsFix
Opus on routine tasksOpus is several times more expensive per token than SonnetSwitch to Sonnet for code edits, comprehension, refactors; reserve Opus for ambiguous architecture or hard debugging
Bloated CLAUDE.mdLoaded on every turn, multiplies by session lengthCap user CLAUDE.md under 100 lines; move long rules into .claude/rules/*.md with paths: frontmatter so they only load when relevant files are read
Repeated cold cachesA /loop 5m poll lands on the cache TTL boundaryUse /loop 4m (stays warm) or /loop 20m (amortizes the miss), never exactly 5m
Idle agent team runningEach teammate has its own context that ticks even when no one is actively prompting/team stop when work pauses; do not leave teams running overnight
MCP server dumping raw outputA 500-row query lands as 500 rows in contextWrap noisy MCP tools in subagents so only the summary returns to the parent
/compact skipped before long pausesComing back to a session loads the whole conversationRun /compact (with custom instructions to keep the bits you need) before walking away

The cache-hit lever

Prompt caching reduces input-token cost dramatically when the same prefix is re-used within the cache TTL (default 5 minutes, refreshed each time the cached prefix is used). On Claude Code that prefix is the system prompt plus your CLAUDE.md files plus skill bodies plus recent file reads. Two practical implications:

  • Long sessions stay cheap as long as the prefix does not churn. Reading a new large file early in a session is paid once if the cache catches; later turns reuse it.
  • CLAUDE.md size is a tax that compounds. A 1,000-line CLAUDE.md cached at session start is fine; a CLAUDE.md that gets edited mid-session invalidates the cache and forces a full re-read on the next turn. Edit infrequently; commit changes between sessions.

If your spend per turn is climbing as the session goes on, the cache is not catching. Check /usage for token-by-category counts; if cached input drops while uncached input rises, find what is invalidating (often new file reads, regenerated tool output, or in-session CLAUDE.md edits).

Footguns

/usage dollar estimates are local guesses. Token counts are accurate; the dollar conversion uses a local price table that can drift from billing in either direction. Use it for trend lines (is this session 2x more expensive than yesterday?), not for invoice reconciliation. The Console Usage page is the truth.

Agent teams scale costs with team size, not with output. A 5-teammate team running idle for an hour can burn more than one developer working hard for an hour, because each teammate maintains its own context window and ticks on background activity. /team stop when you are not actively coordinating; do not leave teams running across lunch.

Pro/Max subscribers see “$0” in /usage but still have plan-level usage caps. The session cost line is meaningless for subscribers; the bars showing plan-level usage are the ones to watch. A Pro user blowing through plan limits gets throttled, not billed extra, but the session in progress can still hit the wall.

Per-developer cost averages hide the long tail. “$13/day” sounds tame until you discover the 95th-percentile developer is on $80/day because they run two parallel sessions on Opus. Look at percentile distributions, not averages, when planning rate limits and budgets. The canonical doc itself notes 90% are under $30/day, which means 10% are above.

Switching models mid-session does not retroactively reduce cost. Tokens already burned at Opus rates are not refunded when you /model sonnet. Switching helps the rest of the session; do it early, not after you have already paid for the expensive part.

When NOT to optimize Claude Code cost

  • You haven’t hit the bill yet. A pilot group is cheap by design. Spend energy on workflow quality first; optimize cost when you actually feel it.
  • The optimization breaks the work. Switching to Sonnet for an architecture refactor that Opus was nailing saves $5 and costs you a day of debugging. Pick the right model for the task, not the cheapest available.
  • The fix is “use Claude less”. Cutting usage to cut spend defeats the point of having Claude. The right targets are unnecessary spend (idle teams, cache misses, oversized CLAUDE.md), not productive spend.
  • You are debugging a one-off bug. A 30-minute Opus session that finds a production bug pays for itself many times over. Save the cost lens for daily-driver workflows, not for forensics.
  • The team is two people. Workspace limits, LiteLLM, admin workspaces, percentile dashboards: all overkill at small scale. Read the Console’s per-key page once a week and call it sufficient until the team grows.

Sources

  • Manage costs effectively
    Authoritative: /usage command, admin workspace for centralized tracking, average enterprise spend (~$13/dev/active day, $150-250/dev/month, <$30/day for 90%), TPM/RPM per-user rate-limit recommendations, agent-team scaling.
  • Prompt caching (Anthropic)
    5-minute TTL, refresh-on-reuse semantics, the price difference between cached and uncached input tokens. Source for why /loop interval choice and CLAUDE.md size dominate spend on long sessions.
  • LiteLLM
    Open-source proxy. Anthropic's costs doc mentions large enterprises using LiteLLM for cost metrics on Bedrock, Vertex, and Foundry deployments. Unaffiliated and unaudited, per the doc; useful per-key spend tagging.

Was this helpful?