/usage shows your current session’s token spend and a local dollar estimate; the Claude Console Usage page is authoritative for billing. Most “why is my bill so high” answers fall out of three numbers: which model you chose, your prompt-cache hit rate, and whether you left an agent team running.
The numbers that matter
The canonical doc puts enterprise spend at roughly $13 per developer per active day and $150-250 per developer per month, with 90% of users below $30 per active day. Translation: if your team’s average is hitting $40+/dev/day, something is off (Opus on routine tasks, sessions that never get /cleared, idle agent teams). If it is below $5/dev/day, half the team is probably not actually using Claude.
See your spend in three places
Live in a session:
/usage
Shows token counts, an estimated dollar figure, and (for Pro/Max subscribers) plan usage bars. The estimate is computed locally from token counts and may differ from your actual bill; treat it as a directional signal, not invoice-ready accounting.
Always-visible status line. Configure your status line to show usage so you do not have to type /usage every five minutes. Especially useful when you are debugging a session that feels expensive.
Authoritative billing: the Usage page in the Claude Console. Cost and usage reporting for admins; this is where you go when finance asks why the line item moved.
Manage spend for a team
Three layers, choose based on org size:
- Workspace spend limits (Claude API). Set a hard ceiling on the Claude Code workspace; a runaway session cannot exceed it. The right shape for any team using direct API access.
- Centralized admin workspace. Provides cross-developer cost reporting in the Console. The dedicated workspace cannot mint API keys for other use; it exists for Claude Code attribution only.
- LiteLLM in front of the API. Several large enterprises route Claude Code traffic through LiteLLM as a proxy that tags every request with a per-team key. Adds operational overhead; gives you per-team attribution that the Console does not yet provide natively.
Where the spend actually goes
| Cause | Why it costs | Fix |
|---|---|---|
| Opus on routine tasks | Opus is several times more expensive per token than Sonnet | Switch to Sonnet for code edits, comprehension, refactors; reserve Opus for ambiguous architecture or hard debugging |
Bloated CLAUDE.md | Loaded on every turn, multiplies by session length | Cap user CLAUDE.md under 100 lines; move long rules into .claude/rules/*.md with paths: frontmatter so they only load when relevant files are read |
| Repeated cold caches | A /loop 5m poll lands on the cache TTL boundary | Use /loop 4m (stays warm) or /loop 20m (amortizes the miss), never exactly 5m |
| Idle agent team running | Each teammate has its own context that ticks even when no one is actively prompting | /team stop when work pauses; do not leave teams running overnight |
| MCP server dumping raw output | A 500-row query lands as 500 rows in context | Wrap noisy MCP tools in subagents so only the summary returns to the parent |
/compact skipped before long pauses | Coming back to a session loads the whole conversation | Run /compact (with custom instructions to keep the bits you need) before walking away |
The cache-hit lever
Prompt caching reduces input-token cost dramatically when the same prefix is re-used within the cache TTL (default 5 minutes, refreshed each time the cached prefix is used). On Claude Code that prefix is the system prompt plus your CLAUDE.md files plus skill bodies plus recent file reads. Two practical implications:
- Long sessions stay cheap as long as the prefix does not churn. Reading a new large file early in a session is paid once if the cache catches; later turns reuse it.
- CLAUDE.md size is a tax that compounds. A 1,000-line CLAUDE.md cached at session start is fine; a CLAUDE.md that gets edited mid-session invalidates the cache and forces a full re-read on the next turn. Edit infrequently; commit changes between sessions.
If your spend per turn is climbing as the session goes on, the cache is not catching. Check /usage for token-by-category counts; if cached input drops while uncached input rises, find what is invalidating (often new file reads, regenerated tool output, or in-session CLAUDE.md edits).
Footguns
/usage dollar estimates are local guesses. Token counts are accurate; the dollar conversion uses a local price table that can drift from billing in either direction. Use it for trend lines (is this session 2x more expensive than yesterday?), not for invoice reconciliation. The Console Usage page is the truth.
Agent teams scale costs with team size, not with output. A 5-teammate team running idle for an hour can burn more than one developer working hard for an hour, because each teammate maintains its own context window and ticks on background activity. /team stop when you are not actively coordinating; do not leave teams running across lunch.
Pro/Max subscribers see “$0” in /usage but still have plan-level usage caps. The session cost line is meaningless for subscribers; the bars showing plan-level usage are the ones to watch. A Pro user blowing through plan limits gets throttled, not billed extra, but the session in progress can still hit the wall.
Per-developer cost averages hide the long tail. “$13/day” sounds tame until you discover the 95th-percentile developer is on $80/day because they run two parallel sessions on Opus. Look at percentile distributions, not averages, when planning rate limits and budgets. The canonical doc itself notes 90% are under $30/day, which means 10% are above.
Switching models mid-session does not retroactively reduce cost. Tokens already burned at Opus rates are not refunded when you /model sonnet. Switching helps the rest of the session; do it early, not after you have already paid for the expensive part.
When NOT to optimize Claude Code cost
- You haven’t hit the bill yet. A pilot group is cheap by design. Spend energy on workflow quality first; optimize cost when you actually feel it.
- The optimization breaks the work. Switching to Sonnet for an architecture refactor that Opus was nailing saves $5 and costs you a day of debugging. Pick the right model for the task, not the cheapest available.
- The fix is “use Claude less”. Cutting usage to cut spend defeats the point of having Claude. The right targets are unnecessary spend (idle teams, cache misses, oversized CLAUDE.md), not productive spend.
- You are debugging a one-off bug. A 30-minute Opus session that finds a production bug pays for itself many times over. Save the cost lens for daily-driver workflows, not for forensics.
- The team is two people. Workspace limits, LiteLLM, admin workspaces, percentile dashboards: all overkill at small scale. Read the Console’s per-key page once a week and call it sufficient until the team grows.