Effort levels control how much thinking Claude does before responding. Model selection controls which model does the thinking. Fast mode controls how quickly responses come back. All three are independent.
If a response feels shallow, try raising the effort level before switching to a more expensive model.
Set the effort level for a session
/effort high
Available levels:
| Level | Opus 4.7 | Opus 4.6 | Sonnet 4.6 |
|---|---|---|---|
low | Yes | Yes | Yes |
medium | Yes | Yes | Yes |
high | Yes | Yes | Yes |
xhigh | Yes | No | No |
max | Yes | Yes | Yes |
xhigh is Opus 4.7 only. It runs more extended thinking passes than high.
For a single session from an environment variable:
CLAUDE_CODE_EFFORT_LEVEL=high claude
The level persists for the entire session. You can change it mid-session with /effort.
Trigger max thinking for one prompt
Add ultrathink anywhere in your message:
ultrathink: Review this database migration for correctness and atomicity.
This triggers maximum extended thinking for that single message only. The effort level for the rest of the session is unchanged.
The opusplan alias does the same thing but is specific to planning tasks. Both keywords work regardless of which Opus model you are using.
See the thinking in real time
Option+T (Mac) or Alt+T (Windows/Linux) toggles the thinking display. When on, Claude’s reasoning process appears before the response. Useful for debugging why Claude reached a particular conclusion.
The thinking display can be verbose on xhigh or max. Turn it off for routine sessions and on only when you need to inspect the reasoning.
1M context window
Append [1m] to your model config to use the 1 million token context window variant:
export ANTHROPIC_MODEL="claude-opus-4-7-20260101[1m]"
The 1M context window is useful for sessions that need to hold an entire large codebase in context. It costs more per token and is slower on the first turn as the context loads.
Effort levels vs fast mode vs model selection
These three levers compose but do not overlap. Effort level controls how much thinking per response, affecting quality. Fast mode controls how fast responses come back, affecting latency and cost per token (Opus 4.6 only). Model selection picks the model, affecting capability ceiling and base cost.
A max effort Sonnet 4.6 session is different from a medium effort Opus 4.7 session. Neither is universally better. Match to the task.
Opus 4.7 adaptive reasoning
On Opus 4.7, the model always uses extended thinking. It scales the amount of thinking based on perceived task complexity. You cannot turn extended thinking off on Opus 4.7; you can only set a floor (effort level) for how much thinking to do.
On Opus 4.6 and Sonnet 4.6, extended thinking is off by default and turned on by effort level settings.
Footguns
Higher effort does not help IO-bound tasks. If Claude needs to read 40 files and the bottleneck is tool calls, more thinking between tool calls adds latency without better output. Reserve high and max for tasks that require reasoning over information already in context.
xhigh on Opus 4.7 gets expensive fast. Extended thinking tokens at xhigh can be 10-20x the cost of a standard response on the same task. Use it selectively, not as a default.
Effort level is not saved across sessions. /effort high applies to the current session only. To make a level persistent, add CLAUDE_CODE_EFFORT_LEVEL=high to your shell’s rc file.
ultrathink is a one-off trigger. It does not affect the rest of the session. For max thinking across all turns in a planning session, use /effort max instead.
Thinking content counts toward context. On max or xhigh, the thinking tokens are added to the conversation context. In a long session, this accelerates context window exhaustion. Run /context to check usage if a session starts feeling sluggish.
When NOT to raise effort levels
- The task is mechanical: renaming variables, formatting, adding type annotations. More thinking produces the same output with higher cost and latency.
- You are debugging a shallow error. A stack trace is a stack trace; reasoning harder does not make it more readable.
- You are on a Bedrock or Vertex deployment. Effort levels work, but fast mode does not, so you cannot compensate for the added latency on high-volume tasks.
- You are running a batch pipeline. Effort
lowormediumis usually sufficient for automated code review passes, and the cost difference at scale is significant.