Haiku 4.5 is fast and cheap, roughly a third the price of Sonnet 4.6 per token, with near-frontier intelligence on scoped work. Reach for it on mechanical batch tasks, subagents, and latency-sensitive turns; stay on Sonnet for hard debugging or architecture, where the capability gap still bites.
The numbers in one table
Per million tokens, on the Anthropic API:
| Model | Input | Output | Context | Max output | Adaptive thinking | Effort levels |
|---|---|---|---|---|---|---|
Haiku 4.5 (claude-haiku-4-5-20251001) | $1 | $5 | 200k | 64k | No | No |
Sonnet 4.6 (claude-sonnet-4-6) | $3 | $15 | 1M | 64k | Yes | Yes |
Opus 4.7 (claude-opus-4-7) | $5 | $25 | 1M | 128k | Yes | Yes |
Haiku is 3x cheaper than Sonnet and 5x cheaper than Opus on both input and output. The price gap is real and the capability gap is smaller than the price gap on most simple tasks, which is the whole reason to pick Haiku.
How to switch to Haiku
Three ways, in order of priority:
/model haiku # in-session, persists to user settings
claude --model haiku # at startup
export ANTHROPIC_MODEL=haiku # default for the shell
The haiku alias resolves to the latest Haiku version your account supports. To pin a specific version, set the env var:
export ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-haiku-4-5-20251001
Pinning matters more than people realize: this env var also controls the model used for background functionality such as title generation and the conversation summarizer. If you do not pin it, those background calls drift to whatever Haiku version your provider promotes next.
When Haiku wins
High-volume mechanical work. Renaming a variable across 200 files, regenerating boilerplate, adding type annotations, scaffolding test files. The model just has to do the obvious thing many times. Haiku is fast enough that you feel the difference; the per-call cost is low enough that you stop counting.
Subagent fan-out. Code review pipelines, doc generators, batch annotators. Each subagent gets its own context, runs in parallel, and finishes fast. Set CLAUDE_CODE_SUBAGENT_MODEL=haiku to route every subagent to Haiku regardless of the parent model. Pair it with Sonnet at the top, so the orchestrator stays smart and the workers stay cheap:
export CLAUDE_CODE_SUBAGENT_MODEL=haiku
claude --model sonnet
Latency-sensitive interactive turns. “Show me what splitId does”, “regenerate the hooks file from scratch”, “fix this lint error”. Sonnet is fast; Haiku is faster, and the user feels every saved second on the second and third follow-up.
Cost ceilings on long tasks. A nightly batch job that runs for hours over thousands of inputs has a per-token cost that compounds. The same task on Haiku costs a third of Sonnet; the same job on Opus costs five times Haiku. If accuracy is good enough on Haiku, the cost difference goes straight to your runway.
When Haiku loses
Deep debugging across multiple files. When the bug requires holding ten files in working memory and reasoning about an emergent behavior, Sonnet’s adaptive reasoning earns its price. Haiku does not have adaptive thinking; it spends a fixed thinking budget per turn and cannot scale up on a single hard step.
Architecture and design. “Should we split this service in two?” deserves Sonnet at minimum. Haiku will give you a confident answer that misses the second-order constraint that matters most. The cost of one bad architecture call is more than a year of Sonnet token spend.
Long-context work. Haiku tops out at 200k context tokens; Sonnet and Opus support 1M. If your repo plus reference docs do not fit in 200k, Haiku is not the model. Watch for /compact firing more often than usual on Haiku as a tell.
Anything that wants /effort high. Effort levels are only supported on Opus 4.7, Opus 4.6, and Sonnet 4.6. Setting /effort high while on Haiku silently does nothing. If your workflow depends on dialing reasoning up for hard turns, you cannot do it on Haiku.
A working multi-model pattern
Most teams running Haiku in production do not run it alone. They use a hybrid:
claude --model sonnetas the daily driver, so reasoning is good by default.CLAUDE_CODE_SUBAGENT_MODEL=haikuso subagents (which fan out, isolate context, and rarely need deep reasoning) run on the cheap model./model opusreserved for hard debugging sessions and architecture reviews.- A separate CI script or skill that runs heavy mechanical work explicitly via
claude --model haiku(for example, a nightly “regenerate fixtures” job).
This gives you Sonnet-quality interactive work, Haiku-priced subagents, and Opus available when you need it, without Opus’s price tag eating your daily budget.
Footguns
/effort does nothing on Haiku. Effort levels are listed as supported on Opus 4.7, Opus 4.6, and Sonnet 4.6 in the docs; Haiku is not in the list. If you set /effort high and switch to Haiku, the indicator may show “with high effort” but the API call goes through unchanged. Why this matters: workflows that depend on dialing effort up for the hard turn (a common pattern in code review skills) will silently regress when run on Haiku. Confirm the effort feature is actually firing, or branch your skill on the active model and skip the /effort step on Haiku.
No adaptive thinking on Haiku. Sonnet 4.6 and Opus 4.7 use adaptive reasoning, deciding per-step whether and how much to think. Haiku does not; it gets a fixed thinking budget. Why this matters: a benchmark that compares Sonnet and Haiku on simple tasks will show similar quality, then diverge sharply on a single hard step where Sonnet allocates more thinking and Haiku does not. If you A/B Haiku vs. Sonnet, include hard cases in the eval set, not just the median ones.
Background calls use Haiku by default. Conversation title generation, summarization, and other “small fast” tasks run on whatever ANTHROPIC_DEFAULT_HAIKU_MODEL resolves to. If you are paying per-token (API or extra usage) and the background model drifts to a newer, more expensive Haiku, your “free” background work becomes a line item. Pin the version explicitly.
CLAUDE_CODE_SUBAGENT_MODEL overrides parent model. Set it once, and every subagent on the machine uses that model, including subagents in projects you forgot you had set this for. Why this matters: a subagent that needs Sonnet’s reasoning (say, a thorough code-reviewer) silently downgrades to Haiku and produces a weaker review without you noticing. Either set it per-project (in .envrc or a wrapper script) or set it per-subagent via the agent definition’s model field.
Smaller context plus same daily compaction rhythm equals more compactions. A heavy session that compacts every 90k tokens on a Sonnet 1M run will compact every ~30k tokens on Haiku 200k. Why this matters: each compaction loses fidelity. The cost savings from Haiku may be partly absorbed by quality regression from extra compactions on long sessions. For long sessions with large code reads, stay on a 1M-context model.
When NOT to use Haiku
- You are doing something hard once. If the task is rare and important (a real production bug, a design call, a security review), the cost difference between Sonnet and Haiku is rounding error against the cost of being wrong. Pick the smarter model.
- You depend on
/effortor extended thinking strategy. Haiku does not support effort levels and does not do adaptive reasoning. If your skill or subagent leans on either, switching it to Haiku silently breaks the pattern. - Your context is over 200k tokens. Haiku tops out at 200k; the 1M models give you headroom. Save Haiku for short or chunked work and let Sonnet/Opus handle long-context jobs.
- You have not measured. “Cheaper” is not always cheaper. If a task takes Haiku three retries to do what Sonnet does in one, you pay for Haiku three times and waste your time. Run an honest A/B before adopting Haiku as the default for a task you care about.
- You want one model for everything. Hybrid routing (Sonnet for orchestration, Haiku for subagents, Opus for hard turns) wins on cost and quality, but it adds operational complexity. If you cannot maintain three model configurations, stay on Sonnet and revisit when the cost actually hurts.