What does Claude Code's plan mode actually produce on a real refactor?

Question

Kalle Lamminpää · Accepted Answer

A captured claude --print --permission-mode plan session asked Claude to add a retry mechanism with exponential backoff and a dead-letter queue to the demo's notifications service. Plan mode produced a 196-line specification at ~/.claude/plans/-.md with a decision table, state machine, file list, and 23 enumerated test cases; the project was unchanged on exit. The setup The demo has a NotificationService that handles enqueue / mark-sent / mark-failed transitions. Failed notifications today get stamped status: 'failed' and sit in the queue; there is no retry logic and no dead-letter queue. The notifications module also has no test file, so any change to it would need a fresh test suite. The prompt was deliberately execution-shaped: The capture script wrapped it with CAPTURE_FLAGS="--permission-mode plan", which sets the session's permission mode to plan. In plan mode, Claude reads files and runs read-only shell commands but cannot edit your source files. The execution-shaped prompt becomes a planning prompt because Claude cannot do anything else. What plan mode actually let Claude do The events file lists 49 tool calls before the final plan write. By type: 30 Read calls to source and test files 9 Bash calls (all read-only: find, ls, grep) 4 Glob and 2 Grep calls for codebase navigation 2 Agent dispatches (the built-in Explore agent for module survey, and a custom code-architect subagent for the design pass) 1 AskUserQuestion asking how to interpret "up to 3 attempts" 1 Write to ~/.claude/plans/add-a-retry-mechanism-compiled-crayon.md 1 ExitPlanMode to formally exit Plan mode is not "read-only mode". It is "no edits to your source files" mode. Reads, exploration, subagent dispatch, and writes outside the project (~/.claude/plans/) all work normally. The only path Claude is blocked from is the one that would change your code. The clarifying question that auto-resolved In the middle of the session Claude called AskUserQuestion: How should "up to 3 attempts" be counted? 3 total attempts. Initial send + 2 retries. Delays: 1 min, 2 min. After 3rd fails → DLQ. (Most literal reading of "up to 3 attempts".) 1 initial + 3 retries. 4 total attempts. Delays: 1 min, 2 min, 4 min. After 4th fails → DLQ. (Reads "3 attempts" as 3 retry attempts.) In --print mode the user is not present to answer. Claude resolved the question by picking the first option (the literal reading), recorded the choice in the plan's "Decisions" table with the rationale, and continued. The plan's preamble flags this explicitly: "Key choices baked in (locked unless you push back)". This is the right shape for ambiguous prompts under plan mode: the assumption gets documented at the top of the plan so you can see exactly which interpretation Claude picked. Run the same prompt interactively and you would have answered the question yourself; run it under --print and you review it after the fact in the plan. The plan as a deliverable The 196-line plan file at ~/.claude/plans/add-a-retry-mechanism-compiled-crayon.md has nine top-level sections: Context: what the current code does, why it fails, what this change adds Decisions: a 5-row table of design choices with rationale Type changes: the new fields on Notification Service changes: full pseudocode for markFailed's new branching, the backoff formula, the due() method, the constructor option shape State machine: an ASCII diagram of the queue/sent/DLQ transitions Files to modify / create: and explicitly the files that stay unchanged (src/index.ts, src/notifications/templates.ts, src/shared/time.ts) Test plan: 23 enumerated test cases, with the test infrastructure pattern (capturing logger, injectable clock) called out in code Edge cases: markFailed called before nextRetryAt elapses, markSent on a DLQ item, attempts increment timing, DST/timezone irrelevance, noUncheckedIndexedAccess interaction Verification + Backwards compatibility: npm run typecheck, npm test, npm run dev, and a note that index.ts:11's zero-arg constructor still works The plan is not vapor. The decision table names a real tradeoff (retry status stays 'queued' because changing the status union would ripple through callers), the state machine names a real edge (markSent on a DLQ item), and the 23 test cases include several that would catch regressions a less careful implementation would introduce. Why this plan is reviewable in a way code is not Three design decisions in the plan are worth pushing back on before any code lands: The module-level logger is moved to a constructor option. Currently the logger is a module constant; the plan converts it into NotificationServiceOptions.logger? so tests can inject a captured one. This works but is a non-trivial change that affects testability across the codebase. A reviewer might say "use the existing out/err injection pattern from logger.test.ts instead". The plan invites that conversation; the implementation, once written, does not. Retry status stays 'queued', not 'retrying'. Plan rationale: "Keeps the status union minimal and lets pending() keep its existing semantics." Defensible, but a reviewer who plans to write a status-based dashboard might want a distinct state for visibility. The plan surface lets that conversation happen before the union is committed; the implementation commits it. due(now?) vs pending() is a new API split. The plan adds a new accessor with subtly different semantics (pending() includes retrying items not yet due; due() excludes them). Two methods that differ on one condition is a recipe for confusion at call sites. A reviewer might say "merge them, take a now parameter, default to this.now()". The plan invites the merge. These three decisions together would have been ~30 lines of code, ~150 lines of tests, and a day of refactoring after the fact if any of them turned out wrong. Reviewing them as a 196-line plan costs ten minutes. Footguns Plan files have whimsical filenames. The plan landed at ~/.claude/plans/add-a-retry-mechanism-compiled-crayon.md. The "compiled-crayon" suffix is generated; if you run plan mode three times on similar prompts you get three plans with three different word pairs, and only the timestamp tells them apart. Why this matters: list with ls -la ~/.claude/plans/ and rename the ones you keep, or accept that plan files are throwaway artifacts and recreate them. Do not rely on the filename meaning anything. AskUserQuestion in --print mode auto-resolves, silently. Claude picks an option and continues. The plan records the assumption in its "Decisions" table, but if you do not read the table you will miss that a real disagreement was settled by Claude alone. Why this matters: when you read a plan-mode artifact, scan the decisions table first. Anything you would have answered differently is a place to push back before approving the plan. Plan mode is not free. This session did 30 reads, 2 subagent dispatches (each its own mini-session), and 1 AskUserQuestion before producing the plan. Token usage is comparable to a full implementation pass; the savings come from the review surface, not the cost. Why this matters: if your team treats plan mode as "free pre-implementation reconnaissance", you will be surprised by the bill. Run plan mode when the plan itself is the value, not as a pre-warmer for execution. Subagents in plan mode are still subagents. The Explore agent and the code-architect agent each ran in their own context and reported back to the lead. Their tool calls also obey plan mode (no edits), but the overhead of two extra contexts is real. Why this matters: a plan mode session that fans out to subagents costs more than one that does not. If you do not need the subagent fan-out, prompt accordingly ("design this without subagents") or accept the bill. The plan and the implementation can drift. A plan is a snapshot of Claude's design intent at the moment the plan was written. If you approve the plan and run a second session to implement it, that session will read the same files plus the plan, but it can choose to deviate. Why this matters: when you ask Claude to implement an approved plan, paste the plan into the prompt or reference it explicitly; do not assume Claude will read ~/.claude/plans/ on its own. The plan is yours to enforce. Plan mode allows writes to ~/.claude/plans/, not other writes outside the project. The boundary is "no edits to your source files", which Claude interprets as "writes to the plans directory are part of the plan-mode workflow and allowed". Other writes outside the project (say, to /tmp/) may or may not work depending on the permission allow list. Why this matters: do not design plan-mode workflows that depend on Claude writing somewhere unusual. The canonical artifact path is ~/.claude/plans/; everything else is on you to test. When to reach for plan mode Wide-blast-radius refactors. A change that touches multiple modules and changes invariants. The plan's decision table and state machine are review-able in ten minutes; the same review on a 600-line PR takes an hour. Architecture decisions you will live with for years. Plan mode forces the design choice to surface as text before it surfaces as code. The text is editable; the code, after it ships, is not. Code review for an autonomous loop. Before letting Claude run for 20 minutes editing things, run plan mode for 5 minutes and read the plan. Catch the wrong premise here, save the loop. Compliance contexts. Some teams need a written design before changes ship. Plan mode produces one; the artifact lives at a known path; commit it to the repo if you want it auditable. Pair programming with a junior teammate. Run plan mode; review the plan together; let them do the implementation in a second session. The plan is the teaching surface. When NOT to use plan mode Trivial changes. Renaming a variable, fixing a typo, adjusting an assertion. The plan overhead is a tax with no payback. Just edit. When the design is already settled. If the prompt fully specifies the change ("rename log to logger.info in three files"), plan mode produces a redundant document that says the same thing. When you need test results, not a plan. Plan mode runs read-only Bash, so it can execute most test suites. But it cannot edit a fixture or insert a test, so a "verify by running tests after a small change" workflow does not work. Use execution mode and --permission-mode acceptEdits. When the prompt would benefit from clarification. AskUserQuestion auto-resolves in --print. If the prompt has known ambiguity, run plan mode interactively (claude --permission-mode plan, no --print) so you can answer the question Claude was going to ask. When you trust Claude to execute and do not need pre-approval. Most coding work, after a few sessions, falls into this bucket. Plan mode is a tool for high-stakes changes; using it for routine work is over-engineering.

What does Claude Code's plan mode actually produce on a real refactor?

The setup

What plan mode actually let Claude do

The clarifying question that auto-resolved

The plan as a deliverable

Why this plan is reviewable in a way code is not

Footguns

When to reach for plan mode

When NOT to use plan mode

Sources

Read more