When does fast mode pay back the higher per-token cost?

Question

Kalle Lamminpää · Accepted Answer

Fast mode is not a different model. It is the same Opus 4.6 running on a different API configuration that returns tokens roughly 2.5x faster. You pay $30/$150 per million input/output tokens instead of $15/$75. Same reasoning quality, half the wait.

It comes down to whether the time you save justifies five times the token cost for that specific task.

Enable and disable

That is the toggle. /fast on = fast mode active for this session. /fast again = back to standard. A gray lightning icon in the status line means fast mode is running.

From the environment, to enable for a script or automated session:

To disable fast mode org-wide (useful for cost control):

Set in managed settings to prevent users from enabling it.

For plans where fast mode is not enabled by default, ask your admin to turn on fastModePerSessionOptIn: true. This lets individual users enable it per session without enabling it globally.

When it pays off

Interactive debugging is the clearest win. When you're reading output, thinking, asking another question, reading more output, waiting 8 seconds instead of 20 per turn adds up to a meaningfully faster loop over an hour.

Live code review is similar. If you're explaining changes to a colleague and using Claude to check specific points in real time, latency is visible and annoying.

A hard deadline is an explicit trade. You have 30 minutes to understand a failing deploy. Burning more money to get answers faster is the trade you're choosing to make.

Short sessions may barely move the needle. A 10-turn session at $30/MTok costs roughly $3 more than at $15/MTok, which may be trivial depending on your plan.

When it burns money for no gain

Long autonomous tasks are where it costs the most for no return. If Claude is running for 20 minutes while you get coffee, 2.5x speed does not help. You come back in 20 minutes either way.

Batch CI pipelines are not interactive. Faster output means the job ends sooner, but unless you are blocked on CI, the wall-clock benefit is small relative to the cost increase.

High-volume mechanical tasks (renaming 200 functions, adding type annotations, generating test stubs) are token-heavy and benefit more from effort-level tuning than latency reduction.

Fast mode vs effort level

These are independent levers. Fast mode is the same model with lower latency and higher cost per token. Effort level is the same model with more or less thinking time per response, affecting quality. Model selection switches models entirely.

You can run fast mode at low effort for quick-but-cheap mechanical edits. You can run standard mode at max effort when you want maximum reasoning but can wait. They do not conflict.

Footguns

Switching mid-conversation costs the full uncached context. When you toggle /fast in the middle of a long conversation, the context is not cached under the new billing rate. You pay the full input price for the entire conversation history at the fast-mode rate for that next turn. Enable fast mode at the start of a session, not mid-way through a 50-turn debugging session.

Fast mode tokens always count as extra usage, even if you have monthly tokens remaining. They never draw from your included plan bucket. Check your plan's extra-usage rate before enabling it for a team.

Rate limits on fast mode are tighter than standard. If Claude hits the limit, it falls back to standard Opus 4.6 automatically. The status icon turns gray. You are no longer in fast mode, with no warning other than the icon.

Fast mode is not available on Bedrock, Vertex, or Azure Foundry. If you route Claude Code through a third-party provider, the toggle appears in the UI but has no effect.

Opus 4.7 does not support fast mode. If you switch to Opus 4.7, fast mode is automatically disabled.

When NOT to use fast mode

You are running an autonomous overnight task. Standard mode is fine. The job finishes while you sleep either way.
Your plan's extra-usage rate is high. Check before enabling team-wide.
You are on Bedrock, Vertex, or Foundry. The toggle has no effect.
You want more reasoning depth. Use effort levels instead; fast mode does not affect how much Claude thinks per response.

When does fast mode pay back the higher per-token cost?

Enable and disable

When it pays off

When it burns money for no gain

Fast mode vs effort level

Footguns

When NOT to use fast mode

Sources

Read more