How does Claude Code find a regression that the test suite did not catch?

Question

Kalle Lamminpää · Accepted Answer

A captured claude --print session reverted a real bug landed three commits ago: tests passed both before and after the bad commit because no test covered the affected path, and the user only noticed because a specific cancellation-rate number was wrong in production. Claude ran five tool calls, used the suspect commit's message as the lead, and produced a git revert commit that preserved history and brought the suite back to green.

The setup

A bug commit landed at 5b7b333 with the message "refactor(reporting): clearer cancellation-rate denominator". The diff:

The "refactor" turned the cancellation rate from "share of all bookings that were cancelled" (1 of 10 = 0.1) into "ratio of cancellations to non-cancellations" (1 to 9 = 0.111). The names sound similar, so the commit looks innocuous; the math is wrong.

The test suite passed both before and after this commit. monthlyReport has no test file in the demo, so no assertion pinned the contract. The bug only surfaced when a downstream consumer looked at a specific report and noticed the number was off.

The prompt

Two parts of this prompt are doing work:

The expected vs actual numbers (0.1 vs 0.111). With "the rate seems wrong" alone, Claude would have had nothing to verify against. With concrete numbers, Claude can compute 1 / (10 - 1) = 0.111 and immediately match the wrong formula.
"Find the commit that broke it" points Claude at git history rather than at the code. Without this nudge, Claude might have edited the current file directly, fixing the symptom without finding the regression.

What Claude actually did, in five tool calls

The events file shows the smallest session in the demo's corpus:
Bash(git log --oneline -20) to survey recent commits
Glob(src/reporting/*/.ts) to confirm where reporting code lives
Bash(git show 5b7b333) to inspect the suspect commit
Bash(git revert --no-edit 5b7b333) to undo it
Bash(npm test 2>&1 | tail -40) to confirm green

The interesting jump is from step 1 to step 3. Claude saw a commit message that said "cancellation-rate denominator" in git log and jumped straight to inspecting that specific commit. Step 2 was a confirmation read for which file lives where; the commit's diff in git show was sufficient to verify the bug without ever opening the source file in a Read call. Two minutes of human time would have looked the same.

Why git revert was the right move

Claude could have edited src/reporting/service.ts directly and undone the change. Instead it ran git revert --no-edit 5b7b333, which:

Produces a new commit (Revert "refactor(reporting): clearer cancellation-rate denominator") that explicitly undoes the named one
Preserves the bug commit in history, with full diff visible to anyone running git log later
Is reviewable as one diff (the revert) rather than as "manual edit that happens to undo something"

For a bug that occupies exactly one commit and that commit contains nothing else worth keeping, git revert is the highest-fidelity intervention. The commit graph reads "we tried, it broke, we reverted" rather than "we changed something that was already there".

If the bad commit had also contained good changes (say, a comment update plus the math regression), git revert would have undone both, and a manual edit would have been correct instead. Claude inspected the commit's full diff before reverting, presumably to confirm there was nothing else worth saving.

The verbatim final reply

Three things to notice:

The commit SHA is named. Anyone reading this reply can git show 5b7b333 and see the same thing Claude saw.
The math change is restated. A reviewer who only reads the reply (not the diff) still gets enough to verify mentally.
The expected output is confirmed. The original user complaint named the expected value (0.1); the reply closes the loop on it.

The deeper bug: a test gap

The article's load-bearing observation is that monthlyReport had no tests. The bug landed because no assertion pinned the cancellation-rate contract. The bug surfaced because a downstream consumer caught it.

If a regression hunt is the right shape of work, it usually means a test was missing. The git-revert commit closes the immediate loop; a follow-up commit should add a regression test for the rate calculation that pins the contract going forward. Claude did not write that test in this session because the prompt did not ask for it. A second prompt ("now add a regression test that would have caught this") would round out the work.

Footguns

A misleading commit message would have hidden the bug. Claude jumped to git show 5b7b333 because the message named "cancellation-rate denominator". A message like "refactor(reporting): cleanup" would have given no signal, and Claude would have had to read the diff of every recent reporting commit. The bug was caught easily because the author of the bad commit was honest in the message; in worse codebases the bug would have hidden behind "minor refactor" for weeks. Why this matters: write commit messages that name the surface area you touched. The next regression hunter (human or agent) is faster when the messages are honest.

git log -20 is not always enough range. Claude limited the survey to the last 20 commits and got lucky: the bug was three commits in. If the regression had been there for two months, the suspect commit could be 200+ commits ago. Why this matters: when the bug timeline is unknown, scope git log to the affected file (git log -- src/reporting/service.ts) instead of the whole repo. That filters by relevance, not recency.

Tests passing is not the contract. The suite was 42/42 green both before and after the bug commit because no test covered monthlyReport. The contract was carried entirely by the original code's expression of intent, which is the worst place to keep a contract. Why this matters: if a unit produces a number that another team relies on, write a test that pins the formula. "It used to work" is not a fix you can audit; a test is.

git revert and manual edit are not interchangeable. Revert is the right move when the bad commit contains exactly the change you want to undo. Manual edit is the right move when the commit contains a mix and you want to keep some of it. Claude inspected the suspect commit's diff via git show first; that step is non-optional. Why this matters: do not let an agent reach for git revert reflexively. The diff inspection step is what lets it know the revert is safe.

The user's specific number was load-bearing. "0.111 vs 0.1" was a verifiable claim. "The rate seems wrong" would not have been. The article's session worked because the user did the math, not because Claude is good at finding regressions. Why this matters: when reporting a regression, do the math yourself first and report the expected vs actual values. This unlocks the agent's ability to verify the fix.

Read-only git is not enough; you need read-write for revert. The demo allows Bash(git log:), Bash(git diff:), Bash(git show:) (read-only) AND Bash(git revert ) (write-light). Without the revert allow, Claude would have had to manually edit. Why this matters: pick your git allow list deliberately. Revert is safer than commit (you can revert a revert); push is dangerous (the team sees it). Allow read-only git everywhere, allow revert when the project is small, never allow push without thinking hard.

When this pattern works and when it does not

Works: the regression is recent, the affected file is named, and a commit message in the last 20 entries telegraphs the suspect change. Five tool calls is the right cost.
Works less well: the regression is old, the commit messages are vague, and the affected behavior is not obvious from the diff. You may need git bisect start and a known-good ref. Plan for a longer session.
Does not work: the regression has compounded across multiple commits (each one shifted the contract a little). Reverting one commit may make things worse; you need to read the timeline carefully and decide which contract you actually want.
Does not work: the diff is correct on its face but the wider system has changed (a "refactor" that requires the consumer to also change). The revert undoes the producer-side change but leaves the consumer expecting the new contract. Fix at the API boundary, not at the implementation.

When NOT to ask Claude to do this

You already know the fix. If you can name the file and the line, edit it. The git ceremony is overhead.
The bug is in a commit that mixes good and bad changes. Manual edit by a human (or a more focused prompt) is right.
The bad commit has been pushed and consumed by other work. Reverting it locally is fine, but the team needs to coordinate. Do not let an agent revert shared history without human approval.
The regression spans many commits. This is git bisect territory. Do that yourself; the agent can help interpret the bisect output, but the bisect itself benefits from a human's judgment about what counts as "broken" at each step.
There is no failing test or visible symptom. A regression hunt without a verifiable target is a wild goose chase. Find the symptom first; then come back.

How does Claude Code find a regression that the test suite did not catch?

The setup

The prompt

What Claude actually did, in five tool calls

Why `git revert` was the right move

The verbatim final reply

The deeper bug: a test gap

Footguns

When this pattern works and when it does not

When NOT to ask Claude to do this

Sources

Read more