What does a real Claude Code session look like fixing a timezone bug end to end?

Question

Kalle Lamminpää · Accepted Answer

A captured claude --print session, run against a tiny TypeScript app with two failing tests on DST boundary days. One prompt produced one diagnosis and one edit that turned the suite green; below are the actual words and the actual diff so you can see the reasoning shape, not a hand-edited story about it.

The setup

A small Node + TypeScript scheduling app for a Finnish municipal booking system. One helper is wrong:

vitest.config.ts pins process.env.TZ = 'Europe/Helsinki' so the failure is deterministic regardless of the developer's machine. Two of the nine tests fail:

The function is named nextDayAtSameLocalTime. Its job is to schedule a reminder at the same wall-clock time the next day. Adding 86,400,000 ms is the wrong primitive: across DST transitions the wall clock jumps an hour, so 24 hours later in UTC is not 09:00 local.

The prompt

A single non-interactive prompt, sent through claude --print:

Nothing else. No "the bug is in time.ts", no hint about DST.

Claude's diagnosis (verbatim)

Root cause: nextDayAtSameLocalTime was adding a fixed 24 hours (ONEDAYMS). That preserves elapsed time, not wall-clock time. Across the spring DST jump (Helsinki UTC+2 → UTC+3), 24h later is 10:00 local; across the autumn fallback, it's 08:00. Neither matches the "same local time tomorrow" the staff reminder needs.

(Punctuation lightly normalized; wording, numbers, and structure are verbatim.)

That sentence is the article. Claude named the right primitive (wall-clock, not elapsed time), the right two specific cases (spring jump and autumn fallback), and the right concrete numbers (10:00 and 08:00). The diagnosis was not "this is a timezone bug, you should look at it"; it was the actual mechanism.

The fix (actual diff)

The shape of this fix is the part worth studying:
Decompose the input into wall-clock parts in the target zone using Intl.DateTimeFormat.formatToParts. This is the only correct primitive in stock JS for "what time is it in Europe/Helsinki right now?".
Increment the day in wall-clock space (p.day + 1), not in elapsed time. This is the one-line change that actually moves the bug.
Two-pass offset resolution for the "we crossed a DST boundary while constructing the answer" edge case. Compute a first-guess UTC instant from the local Y/M/D/H/M, look up Helsinki's offset at that guess, recompute. The second pass is what makes the spring-forward day work cleanly: at the first guess the offset might be the old one; the refinement catches the transition.

The if (hour === 24) hour = 0 line is also worth noticing. Some Intl.DateTimeFormat implementations format midnight as 24:00 rather than 00:00; Claude added the guard without being asked. That kind of small defensive read is the kind of detail that distinguishes a generated edit from a pasted Stack Overflow answer.

What the test suite now looks like

Two passes through the test suite total: the first run before the edit (showing the failure), the second after the edit (confirming green). No flailing, no broken-then-green-then-broken. One read, one diagnosis, one edit, one verification.

What this session does not show, and why

claude --print without --output-format stream-json returns only the agent's final message, not the intermediate tool calls. The transcript above is what Claude said at the end, plus the diff git knows about. We do not have the sequence of Bash(npm test) → Read(src/shared/time.test.ts) → Read(src/shared/time.ts) → Edit(...) → Bash(npm test) events, so we cannot say with certainty which file Claude read first or whether it considered any wrong fixes en route.

For richer captures, modify the script to add --output-format stream-json and parse the events. The trade-off is signal-to-noise: stream-json includes every system event including init, model usage, and tool latencies, which is great for debugging the agent but verbose for an article.

Footguns the session avoided (worth flagging)

Special-casing DST days. A common wrong fix is if (isDstTransitionDay(input)) { ... } else { ... }. That treats the symptom and creates a maintenance bomb when the IANA database updates rules. The right fix reasons in wall-clock time directly, which is correct on every day including the boundary. Claude's diff went straight to the right shape; if it had not, the test would have failed for a different reason on a non-DST day and we would have spotted the over-fit.

Editing the test instead of the code. When given "fix it", an LLM can satisfy the prompt by weakening the test. The success criterion in docs/SCENARIOS.md explicitly checks that the change lives in time.ts, not time.test.ts. Claude did not touch the test file. If you run this scenario yourself and Claude does, that is a signal to tighten the prompt with "without changing the tests".

Adding a date library. This is a 60-line module. A reasonable LLM might suggest npm install date-fns-tz and rewrite the function. That is a real fix but a heavy one for a project with no existing date library. Claude stayed inside the standard library, which matched the project's existing surface. The prompt could pin this with "without adding dependencies" if you have a strong opinion.

Footguns when running this scenario yourself

The session log is short. The capture script writes one file (session.log) with one final message. The shape of the article is "what we have", not "every step the agent took". To capture more, change the capture script to use --output-format stream-json and write the JSONL stream to a separate file. Without that, expect a 5-line transcript even on a complex bug.

Claude's reasoning is not deterministic across runs. A second run might phrase the diagnosis differently or pick a structurally similar but slightly different fix (different helper names, a single-pass offset lookup). The structural shape is reliable; exact phrasing is not. If you cite "Claude said X" in an article, run twice and quote the version that landed in your article so the quote is honest.

--print mode runs without permission prompts only because of .claude/settings.json. The demo allowlists Edit(), Read(), Write(), Bash(npm ), and a few read-only Bash commands. Running this from a directory without that allowlist will hang waiting for a permission prompt that has nowhere to display.

Stable timezone for tests. vitest.config.ts pins TZ=Europe/Helsinki so the bug is deterministic. Without it, a developer running the suite from a UTC machine sees all 9 tests pass and the bug is invisible. Pin the timezone in any test that exercises wall-clock semantics, in CI as well as locally.

When NOT to pattern-match this article to your own debugging

Your bug spans multiple files or services. The session above is a single function in a single file. A real "find the bug across three services" task needs a tighter prompt with file pointers, or you waste tokens on Claude reading every file in the repo before forming a hypothesis.
Your tests do not pin the bug. If running the suite "looks fine" but the system misbehaves in production, Claude has nothing to bisect. The article above only works because the test failure points at the function.
You want a guaranteed exact fix. This article is a transcript, not a benchmark. If Claude solves a similar bug in your repo using a different shape of fix, that is information about your repo's surface, not a regression. Do not paste this article's fix into your codebase verbatim; let Claude read your code and pick the shape that matches.
Your test suite is slow. Each iteration of the agent loop runs the suite. If npm test takes 3 minutes, a single bug fix can burn 10 minutes of wall-clock time waiting for green. Use vitest --run --reporter=dot --pool=threads or scope to one file (vitest --run src/shared/time.test.ts) to keep the loop tight.

What does a real Claude Code session look like fixing a timezone bug end to end?

The setup

The prompt

Claude’s diagnosis (verbatim)

The fix (actual diff)

What the test suite now looks like

What this session does not show, and why

Footguns the session avoided (worth flagging)

Footguns when running this scenario yourself

When NOT to pattern-match this article to your own debugging

Sources

Read more