Can I use Haiku for everyday Claude Code work?

Question

Kalle Lamminpää · Accepted Answer

Haiku 4.5 is fast and cheap, roughly a third the price of Sonnet 4.6 per token, with near-frontier intelligence on scoped work. Reach for it on mechanical batch tasks, subagents, and latency-sensitive turns; stay on Sonnet for hard debugging or architecture, where the capability gap still bites.

The numbers in one table

Per million tokens, on the Anthropic API:

Haiku is 3x cheaper than Sonnet and 5x cheaper than Opus on both input and output. The price gap is real and the capability gap is smaller than the price gap on most simple tasks, which is the whole reason to pick Haiku.

How to switch to Haiku

Three ways, in order of priority:

The haiku alias resolves to the latest Haiku version your account supports. To pin a specific version, set the env var:

Pinning matters more than people realize: this env var also controls the model used for background functionality such as title generation and the conversation summarizer. If you do not pin it, those background calls drift to whatever Haiku version your provider promotes next.

When Haiku wins

High-volume mechanical work. Renaming a variable across 200 files, regenerating boilerplate, adding type annotations, scaffolding test files. The model just has to do the obvious thing many times. Haiku is fast enough that you feel the difference; the per-call cost is low enough that you stop counting.

Subagent fan-out. Code review pipelines, doc generators, batch annotators. Each subagent gets its own context, runs in parallel, and finishes fast. Set CLAUDECODESUBAGENTMODEL=haiku to route every subagent to Haiku regardless of the parent model. Pair it with Sonnet at the top, so the orchestrator stays smart and the workers stay cheap:

Latency-sensitive interactive turns. "Show me what splitId does", "regenerate the hooks file from scratch", "fix this lint error". Sonnet is fast; Haiku is faster, and the user feels every saved second on the second and third follow-up.

Cost ceilings on long tasks. A nightly batch job that runs for hours over thousands of inputs has a per-token cost that compounds. The same task on Haiku costs a third of Sonnet; the same job on Opus costs five times Haiku. If accuracy is good enough on Haiku, the cost difference goes straight to your runway.

When Haiku loses

Deep debugging across multiple files. When the bug requires holding ten files in working memory and reasoning about an emergent behavior, Sonnet's adaptive reasoning earns its price. Haiku does not have adaptive thinking; it spends a fixed thinking budget per turn and cannot scale up on a single hard step.

Architecture and design. "Should we split this service in two?" deserves Sonnet at minimum. Haiku will give you a confident answer that misses the second-order constraint that matters most. The cost of one bad architecture call is more than a year of Sonnet token spend.

Long-context work. Haiku tops out at 200k context tokens; Sonnet and Opus support 1M. If your repo plus reference docs do not fit in 200k, Haiku is not the model. Watch for /compact firing more often than usual on Haiku as a tell.

Anything that wants /effort high. Effort levels are only supported on Opus 4.7, Opus 4.6, and Sonnet 4.6. Setting /effort high while on Haiku silently does nothing. If your workflow depends on dialing reasoning up for hard turns, you cannot do it on Haiku.

A working multi-model pattern

Most teams running Haiku in production do not run it alone. They use a hybrid:

claude --model sonnet as the daily driver, so reasoning is good by default.
CLAUDECODESUBAGENTMODEL=haiku so subagents (which fan out, isolate context, and rarely need deep reasoning) run on the cheap model.
/model opus reserved for hard debugging sessions and architecture reviews.
A separate CI script or skill that runs heavy mechanical work explicitly via claude --model haiku (for example, a nightly "regenerate fixtures" job).

This gives you Sonnet-quality interactive work, Haiku-priced subagents, and Opus available when you need it, without Opus's price tag eating your daily budget.

Footguns

/effort does nothing on Haiku. Effort levels are listed as supported on Opus 4.7, Opus 4.6, and Sonnet 4.6 in the docs; Haiku is not in the list. If you set /effort high and switch to Haiku, the indicator may show "with high effort" but the API call goes through unchanged. Why this matters: workflows that depend on dialing effort up for the hard turn (a common pattern in code review skills) will silently regress when run on Haiku. Confirm the effort feature is actually firing, or branch your skill on the active model and skip the /effort step on Haiku.

No adaptive thinking on Haiku. Sonnet 4.6 and Opus 4.7 use adaptive reasoning, deciding per-step whether and how much to think. Haiku does not; it gets a fixed thinking budget. Why this matters: a benchmark that compares Sonnet and Haiku on simple tasks will show similar quality, then diverge sharply on a single hard step where Sonnet allocates more thinking and Haiku does not. If you A/B Haiku vs. Sonnet, include hard cases in the eval set, not just the median ones.

Background calls use Haiku by default. Conversation title generation, summarization, and other "small fast" tasks run on whatever ANTHROPICDEFAULTHAIKUMODEL resolves to. If you are paying per-token (API or extra usage) and the background model drifts to a newer, more expensive Haiku, your "free" background work becomes a line item. Pin the version explicitly.

CLAUDECODESUBAGENTMODEL overrides parent model. Set it once, and every subagent on the machine uses that model, including subagents in projects you forgot you had set this for. Why this matters: a subagent that needs Sonnet's reasoning (say, a thorough code-reviewer) silently downgrades to Haiku and produces a weaker review without you noticing. Either set it per-project (in .envrc or a wrapper script) or set it per-subagent via the agent definition's model field.

Smaller context plus same daily compaction rhythm equals more compactions. A heavy session that compacts every 90k tokens on a Sonnet 1M run will compact every ~30k tokens on Haiku 200k. Why this matters: each compaction loses fidelity. The cost savings from Haiku may be partly absorbed by quality regression from extra compactions on long sessions. For long sessions with large code reads, stay on a 1M-context model.

When NOT to use Haiku

You are doing something hard once. If the task is rare and important (a real production bug, a design call, a security review), the cost difference between Sonnet and Haiku is rounding error against the cost of being wrong. Pick the smarter model.
You depend on /effort or extended thinking strategy. Haiku does not support effort levels and does not do adaptive reasoning. If your skill or subagent leans on either, switching it to Haiku silently breaks the pattern.
Your context is over 200k tokens. Haiku tops out at 200k; the 1M models give you headroom. Save Haiku for short or chunked work and let Sonnet/Opus handle long-context jobs.
You have not measured. "Cheaper" is not always cheaper. If a task takes Haiku three retries to do what Sonnet does in one, you pay for Haiku three times and waste your time. Run an honest A/B before adopting Haiku as the default for a task you care about.
You want one model for everything. Hybrid routing (Sonnet for orchestration, Haiku for subagents, Opus for hard turns) wins on cost and quality, but it adds operational complexity. If you cannot maintain three model configurations, stay on Sonnet and revisit when the cost actually hurts.

Model	Input	Output	Context	Max output	Adaptive thinking	Effort levels
Haiku 4.5 (`claude-haiku-4-5-20251001`)	$1	$5	200k	64k	No	No
Sonnet 4.6 (`claude-sonnet-4-6`)	$3	$15	1M	64k	Yes	Yes
Opus 4.7 (`claude-opus-4-7`)	$5	$25	1M	128k	Yes	Yes

Can I use Haiku for everyday Claude Code work?

The numbers in one table

How to switch to Haiku

When Haiku wins

When Haiku loses

A working multi-model pattern

Footguns

When NOT to use Haiku

Sources

Read more