AnswerQA

What does Claude Code look like when you ask it to audit a codebase without editing anything?

Answer

A captured `claude --print` session against the demo app, prompted to audit five production-readiness concerns and produce a structured report without modifying any code. Claude used 11 tool calls (1 Glob, 4 Grep, 6 Read), zero edits, finished in 54 seconds, cost 46 cents, and produced a 5-section report with file:line references. The article shows the verbatim audit output, the tool-call census, and a head-to-head comparison against `grep`/`rg` shell scripts for each of the five concerns: TODO scans and missing-test detection are commodity work the agent overpays for, but flagging `new Date()` calls that bypass an injection pattern (vs benign date arithmetic) and reading the dead-letter queue line as 'operational risk' are the kind of semantic judgment that justifies the cost.

By Kalle Lamminpää Verified May 9, 2026

A captured claude --print session against the demo, prompted to audit five production-readiness concerns and produce a numbered report without modifying any code: Claude used 11 tool calls (1 Glob, 4 Grep, 6 Read), zero edits, finished in 54 seconds, and produced a structured report with exact file:line references. This article is the head-to-head against a shell-script alternative for each of those five concerns, and where the agent’s $0.46 actually earned its keep.

The prompt

Audit this codebase for the following production-readiness concerns and
produce a structured report. DO NOT modify any code.

1. Files with no co-located test (look for src/foo.ts without src/foo.test.ts)
2. TODO, FIXME, XXX, HACK comments with exact file:line references
3. Direct 'new Date()' or 'Date.now()' calls that bypass the project's now()
   injection pattern (a testability smell)
4. Any in-memory state on a service that would not survive a process restart
5. Any 'throw new Error(' calls without typed error subclasses

Output a numbered list per concern with file:line references where applicable.
Keep it under 400 words total. Do not write or edit any files.

The “DO NOT modify” constraint is in the prompt twice. It is not enforced by a hook; the demo’s settings.json still allows Edit(*) and Write(*). The constraint is the prompt’s job. The captured permission_denials array on the result event is empty, and the working tree was clean on exit, so Claude obeyed.

What Claude did, in 11 tool calls

The tool-call census is the entire story:

CallsToolUsed for
1Globsrc/**/*.ts to enumerate the source set
1GrepTODO|FIXME|XXX|HACK over src
1Grepnew Date\(|Date\.now\( over src
1Grepthrow new Error\( over src
1Grepnow\s*[:=]|now\(\)|now\?: over src/shared/time.ts
6Readbooking/store.ts, booking/service.ts, reporting/service.ts, notifications/service.ts (×2 with offset/limit), shared/time.ts
0Edit/Write(none, prompt forbade it)

The four greps did the bulk of the work. The reads were used to disambiguate grep matches: which new Date() calls are “current time” (bad) versus pure date arithmetic on a parsed instant (fine), which throw new Error is the canonical project pattern versus genuinely untyped, which Map field is held privately on a service vs a transient closure.

The audit report, verbatim

This is the captured stdout, with em-dashes lightly normalized to commas:

# Production-Readiness Audit

## 1. Source files without a co-located `*.test.ts`
- `src/booking/store.ts`, no `store.test.ts`
- `src/notifications/templates.ts`, no `templates.test.ts`
- `src/index.ts`, no `index.test.ts`

(Pure type files `booking/types.ts`, `notifications/types.ts`,
`reporting/types.ts` excluded.)

## 2. TODO / FIXME / XXX / HACK comments
None found in `src/`.

## 3. `new Date()` / `Date.now()` bypassing the `now()` injection pattern
Treating `now()` injection as the established pattern (used in
`notifications/service.ts:24` and `shared/logger.ts:35`), these production
calls bypass it:
- `src/booking/service.ts:29`, `createdAt: new Date().toISOString()` in
  `BookingService.create`
- `src/reporting/service.ts:54`, `generatedAt` in `monthlyReport`
- `src/reporting/service.ts:89`, `generatedAt` in `weeklyReport`

Not flagged (pure date arithmetic, not "current time"):
`shared/time.ts:54,91,93`, `reporting/service.ts:21`.
`notifications/service.ts:108` correctly routes through `this.now()`.

## 4. In-memory state that won't survive a process restart
- `src/booking/store.ts:8`, `private rows = new Map<string, Booking>()`
- `src/notifications/service.ts:16`, `private queue: Notification[] = []`
- `src/notifications/service.ts:17`, `private dlq: Notification[] = []`
  (dead-letter queue lost on restart, operational risk)

## 5. `throw new Error(` without typed subclasses
Every throw in `src/` uses the bare `Error` constructor; no domain-specific
subclasses (e.g., `BookingNotFoundError`, `InvalidInstantError`) exist:
- `src/booking/store.ts:12`, duplicate booking id
- `src/booking/store.ts:32`, booking not found
- `src/notifications/service.ts:72`, DLQ on `markSent`
- `src/notifications/service.ts:77`, notification not found on `markSent`
- `src/notifications/service.ts:89`, DLQ on `markFailed`
- `src/notifications/service.ts:92`, notification not found on `markFailed`
- `src/reporting/service.ts:19`, invalid ISO date in `addDaysIso`
- `src/shared/time.ts:56`, invalid ISO instant in `parseInstant`
- `src/shared/logger.ts:31`, empty logger scope
- `src/shared/logger.ts:57`, empty sub-scope

Callers cannot distinguish "not found" from "in DLQ" from validation errors
without string matching on `error.message`.

Concern-by-concern: agent vs shell script

For each of the five concerns, here is what a shell-script alternative would have produced and how the agent did against it.

1. Files without a co-located test. The shell version is one pipe:

comm -23 <(find src -name '*.ts' ! -name '*.test.ts' | sort) \
         <(find src -name '*.test.ts' | sed 's/\.test\.ts$/.ts/' | sort)

That gives 6 results. The agent gave 3, with types.ts files correctly excluded as “pure type definitions need no runtime test.” That exclusion is judgment a script does not have. It is also wrong-shaped for a codebase where types files contain runtime guards or zod schemas. Worth knowing: the agent’s category-collapse is an opinion, not a fact.

2. TODO scans. The shell version is grep -rn 'TODO\|FIXME\|XXX\|HACK' src/. It returns nothing. The agent ran the same regex and reported “None found.” The shell version costs nothing and finishes in 30 ms; the agent costs $0.46 and a Grep round-trip. This concern alone is pure overpaying, but it is a small fraction of the total prompt, so the per-concern accounting matters less than the bundle accounting. Still, if all you wanted was concern #2, do not call Claude.

3. new Date() bypassing the now() pattern. This is the concern that flips the value calculation. The shell version is grep -rn 'new Date(\|Date.now(' src/, which returns 7 hits across 4 files. Five of those seven are not what you care about: addDays(parseInstant(iso), 30) constructs a new Date to do arithmetic, but the time it represents is a parameter, not “wall clock now.” The agent correctly classified 3 hits as bypasses (with the now() injection used elsewhere as the corroborating evidence, citing the line numbers where now() IS used) and 4 as benign with explanations. A regex script cannot do that without an AST visitor that knows what parseInstant returns. This is the work that earns the cost.

4. In-memory state on services. The shell version is something like rg 'private (rows|queue|dlq|store)\s*[:=]' src/, but you are basically guessing the variable names. The agent read four files and reported three findings, with the dlq flagged as “operational risk” because losing the dead-letter queue on restart is qualitatively worse than losing the in-flight queue. That qualitative call is the article. A grep would have surfaced the lines without ranking them; the ranking is what an on-call engineer actually wants.

5. Untyped error throws. The shell version is rg 'throw new Error\(' src/, which gives 10 hits. The agent gave the same 10, but added the kicker line: “Callers cannot distinguish ‘not found’ from ‘in DLQ’ from validation errors without string matching on error.message.” That sentence is the audit. The grep gives you the count; the agent gives you the consequence.

The bundle math: concerns 1, 2, 3 (file co-location, TODO, new Date) are 70% findable by find plus a couple of greps. Concerns 4 and 5 surface as grep matches, but the value comes from the agent ranking and explaining them: DLQ flagged as “operational risk,” untyped errors framed as “callers cannot distinguish.” A shell script lists; the agent ranks and explains. If you only need a list, do not pay; if you need the second sentence, pay.

Numbers from the events.jsonl

The result event carries this:

duration_ms:        54409
num_turns:          12
total_cost_usd:     0.46
input_tokens:       13
cache_read:         360726
cache_creation:     31694
output_tokens:      3336
permission_denials: []

54 seconds, 12 turns, 46 cents. The 360k cache-read tokens are the demo’s small surface (16 source files, ~440 lines across the read set) prefixed with the standard system prompt and tool definitions, hot-cached from the SessionStart hook’s git context injection that ran on session boot. On a first-touch session against a 500-file codebase, expect 5 to 10x more cache creation, a multi-minute duration, and several dollars in cost. The Glob would still be one call, but the Read budget grows roughly with the file count it judges worth disambiguating. Plan accordingly.

Footguns

The now() heuristic is implicit. The agent treated notifications/service.ts:24 and shared/logger.ts:35 as evidence that the project has a now() injection pattern. It does, but if you wrote a codebase where now() is named differently (clock(), getCurrentTime(), Time.now()), the agent might still anchor on the wrong landmark and either miss bypasses or over-report them. Why this matters: encode the convention in CLAUDE.md or a skill (see the booking-conventions session) so the agent is not inferring it from grep evidence on the fly.

The audit is not a fix. Five concerns surfaced. Six in-memory data structures, three time bypasses, ten untyped errors. The agent did exactly what the prompt asked: report, do not modify. If the goal is to fix, the report is the start of the next session, not the deliverable. Why this matters: a separate session (with the report pasted in or referenced as a file) is the right shape for the fix. Do not ask the audit session to also fix things; the audit prompt’s “DO NOT modify” constraint and the fix prompt’s “go fix this” constraint are different sessions.

Cost scales with read disambiguation, not with grep count. The four greps cost almost nothing in cache-creation. The six Read calls did. If the codebase has 50 files where grep-flagged lines need to be disambiguated, the audit is 8× longer and 8× the cost on the read budget alone. Why this matters: scope the prompt’s concerns. “Scan everything everywhere” turns into a Read fan-out that no longer earns its cost. “Audit src/booking/ for X, Y, Z” gives the agent a small enough surface that the per-concern judgment stays cheap.

The agent silently merged “Date.now()” findings with “new Date()” findings. The prompt asked about both. The grep matched both. The report only listed new Date() calls; there were no Date.now() calls in the codebase, but the agent did not say so. A script would have made the absence visible. Why this matters: if the audit is for a compliance trail (someone needs to see “we checked for X and found zero”), explicitly require the agent to list each concern with a “found N” line including zeros. Otherwise the absence-of-finding is invisible.

The “no test file” check excluded type-only files. That exclusion is reasonable for the demo. It is not always reasonable. Some teams mandate a co-located test even on *types.ts files (for type-level tests with expectTypeOf, or to lock barrel exports). Why this matters: if your team has a test-everything rule, override the agent’s reasonable-by-default exclusion in the prompt: “Include *types.ts files in the no-test-file list.”

When the read-only audit shape is worth it

  • Onboarding to an unfamiliar codebase. The shape doubles as a reading guide: the agent’s reads tell you which files matter, the report tells you the shape of the trouble.
  • Pre-PR review of a large refactor. Run the audit on the branch, paste the report into the PR description as a “known unaudited surface” section.
  • Periodic baseline audits. Quarterly: same prompt, same five concerns, save the output. Diffing audits over time surfaces drift earlier than a CI lint rule would have caught it.
  • Concerns that need ranking, not listing. “Which of these in-memory state leaks is operational risk vs cosmetic?” is the question grep cannot answer.

When NOT to use this shape

  • A single concern that is regex-shaped. “Find all console.log calls” or “find all any types” is rg. Save the 46 cents.
  • Concerns that need a real type-aware tool. “Find all unused exports” wants ts-prune or knip, not Claude. The agent will produce something that looks right and might be wrong; a tool with a TypeScript program will be authoritative.
  • CI gates. Audits are advisory by design (the prompt’s own constraint is “do not modify”). Wiring an LLM call into CI as a pass/fail is the wrong shape; the cost compounds, the determinism does not, and a flaky audit fails builds for no reason. Use the audit at edit time, not at merge time.
  • Codebases where you have not encoded conventions. As above with now(): the agent’s value comes from anchoring on conventions it can read. If the codebase has none, the audit reads as generic best-practices advice that any engineer could have written without reading your code.
  • Sensitive code paths under read restrictions. A read-only audit still reads. If part of src/ contains secrets, license-restricted code, or regulated data, audit the sensitive surface separately under a session that has a Read deny rule scoped tight to that subtree.

Sources

  • Claude Code CLI reference
    Documents `claude --print --output-format stream-json --verbose`. The `result` event in events.jsonl carries `total_cost_usd`, `duration_ms`, and `num_turns`, which is where this article's 54-second / $0.46 / 12-turn numbers come from.
  • Configure permissions
    The demo allows Read(*), Glob, and Grep without confirmation; Edit and Write would also be allowed but the prompt explicitly forbids modification. The 'permission_denials' array on the result event is empty for this run, confirming Claude never even attempted an edit.
  • Claude Code best practices
    The doc recommends scoping Claude to specific tasks; a read-only audit with a numbered list of concerns is exactly the shape that doc points at. The article tests whether that recommendation holds up against a shell-script alternative.

Was this helpful?

Read more