What does Claude Code's skill auto-invocation actually look like in a real session?

Question

Kalle Lamminpää · Accepted Answer

A captured claude --print session against the demo, with a booking-conventions skill at .claude/skills/booking-conventions/SKILL.md describing how the booking subsystem expects code to be written. Claude's first tool call was the Skill tool loading that skill's body, before any file reads; the resulting implementation honored the skill's conventions even though the prompt never mentioned them.

The skill

SKILL.md has YAML frontmatter and a body. The frontmatter tells Claude when to reach for the skill:

The description is the load-bearing field for auto-invocation. Claude reads it at session init and decides whether the current prompt matches. The whentouse field (snakecase, not kebab-case) adds extra trigger phrases.

The body has six sections covering identity (UUID-only IDs, ISO instants, no millisecond arithmetic), service types (the exact 'passport' | 'building-permit' | 'social-services' union and the four-step process for adding a variant), the status state machine (pending → confirmed → cancelled, cancelled is terminal), aggregations (office-local date prefix matching, never raw milliseconds), logging conventions (createLogger({ scope: 'booking' }), structured data as second arg), and test conventions (vitest, vi.fn() for clock injection).

That body would be ~120 lines of CLAUDE.md if pasted there. As a skill, it loads only when relevant.

The prompt

The prompt mentions "BookingService" and "non-cancelled bookings", which match both the description ("queries on bookings", "cancellations") and whentouse ("bookings", "service types") triggers. The prompt does not say "use the skill" or "follow the conventions"; auto-invocation is supposed to do that.

The init event listed the skill

The first system event in the events.jsonl stream is subtype: "init", with metadata about what's loaded into the session. Inspecting it:

The skill is not a permission rule and not a hook; it is its own first-class field in the init payload. The descriptions of all available skills are appended to the system prompt, which is how the model knows what to reach for. The bodies are NOT loaded yet at init; only the descriptions are.

What Claude actually did, in 10 tool calls

The first tool call is the article. Order from the events file:
Skill({skill: 'booking-conventions'}), which loads the full skill body
Read(src/booking/service.ts)
Glob(src/booking/*)
Read(src/booking/types.ts)
Read(src/booking/store.ts)
Read(src/booking/service.test.ts)
Read(src/shared/time.ts)
Edit(src/booking/service.ts) to add the method
Edit(src/booking/service.test.ts) to add four tests
Bash(npm test 2>&1 | tail -60) to confirm green

The Skill call comes BEFORE any reads. That is auto-invocation in action: Claude saw the prompt mention "BookingService" and "non-cancelled", checked the description of every available skill, matched against booking-conventions, and pulled the body into context as its first move. The skill body then lives in context for the rest of the session.

If description had been vague ("Booking-related conventions") or had not mentioned the trigger words, Claude might have skipped the Skill call and gone straight to reads. The skill would have been invisible.

What the implementation looks like

The diff Claude produced for byService:

Three things to notice:

b.status !== 'cancelled' matches the skill's status state machine: cancelled is terminal, exclude. The prompt said "non-cancelled"; the skill specified what that means.
Lexical sort on startsAt is correct because the skill specified that startsAt is an ISO instant in UTC. Lexical order on ISO 8601 instants matches chronological order. The skill turned a "how should I sort dates?" question into a one-line answer.
ServiceType is the imported type, not a string literal. The skill named the exact union; Claude used it.

The four new tests cover sort order, service-type filtering, status filtering (verifying both pending and confirmed are kept and cancelled is excluded), and the empty-result case. The "verify cancelled is excluded" test is the skill's "cancelled is terminal" rule reified as an assertion.

What the skill saved over CLAUDE.md

If these conventions had lived in CLAUDE.md, they would have been in context for every session, including ones about reporting, notifications, or the timezone helpers. The skill loads only when the booking-related prompt triggers it. For a prompt about adding a feature to src/notifications/, the booking-conventions skill stays unloaded and its 120 lines never enter context. Token spend tracks the relevance of the work, not the size of the project's conventions.

Footguns

The description field is the entire matching surface. Claude reads it at init and decides per-prompt whether to load the skill. A description like "Booking stuff" gets matched on prompts that say "stuff" but not on prompts that say "appointments" or "slots". The article's skill description names six trigger phrases (bookings, citizens, services, slots, cancellations, municipal booking flows) on purpose. Why this matters: write descriptions like search-engine ad copy, not like commit messages. The keywords are the API.

whentouse is a separate field, in snakecase. Some doc examples use when-to-use with a hyphen; the canonical name is whentouse. Claude will silently ignore an unrecognized key. Why this matters: if your skill never auto-invokes despite a clear description, check the field name. Run claude --debug once and look for the skill in the loaded list.

Skill bodies live in context for the rest of the session once loaded. A 1000-line skill that loads on every booking prompt costs 1000 lines of context every turn after, until the session ends or compaction drops it. The article's skill is ~120 lines; that's already significant. Why this matters: keep skill bodies tight. If the body is mostly reference material, split into supporting files (reference.md) that the body links to and that load only when the model reads them; see the multi-file-skills-with-supporting-files article for the pattern.

Auto-invocation is not deterministic. A second run of the same prompt might match the skill description differently (or not at all) if the model's interpretation of the prompt shifts. A prompt that mentions "tests for the booking service" might trigger the skill; "tests for service.ts" might not. Why this matters: when correctness depends on the skill loading, name it explicitly in the prompt ("use the booking-conventions skill"). Auto-invocation is a soft signal; explicit invocation is the hard one.

Skills are loaded, not enforced. The skill's content arrives in the system prompt; Claude is free to ignore parts of it. The article's session honored the conventions because they aligned with the work, not because the skill compelled compliance. Why this matters: if a convention must be followed, encode it as a hook (PreToolUse to block violations) or a deny rule, not as a skill. Use skills for guidance, not enforcement.

disable-model-invocation: true flips a skill from auto-invokable to user-only. A skill with that flag in its frontmatter is hidden from auto-invocation; the user must invoke it explicitly via /skill-name. Useful for slow or expensive skills (the article's skill does not need this), but easy to set by mistake and then wonder why auto-invocation stopped. Why this matters: if your skill suddenly stops triggering, check the frontmatter for this flag before suspecting a description match issue.

When skills are the right shape

Conventions that apply to a topic. Booking conventions, payment-flow conventions, deployment-step conventions. The article's case.
Domain knowledge that helps pick the right primitive. "When working with timezones, here's the canonical helper file." The skill points Claude at the right place; the helper file's docstring takes it from there.
Reusable workflows. "Generate a release note from the merged PRs since the last tag." The body is the workflow; the description triggers on "release notes" prompts.
Per-project style guides. Naming, file layout, test patterns. Bigger than CLAUDE.md should hold; smaller than a manual.

When NOT to use a skill

Project-wide rules that apply on every prompt. Use CLAUDE.md. The skill's lazy loading is wasted overhead.
Per-session dynamic content. Use a SessionStart hook (see article #36) to inject fresh data; skills are static.
Hard policy. Use a deny rule or a PreToolUse hook (see article #37). Skills are guidance, not enforcement.
Trivial conventions. "We use 2-space indent" does not earn a skill. CLAUDE.md handles it.
Sensitive content. Skills are readable text the agent can quote back. If your conventions contain credentials or internal incident details, do not put them in a skill.

What does Claude Code's skill auto-invocation actually look like in a real session?

The skill

The prompt

The init event listed the skill

What Claude actually did, in 10 tool calls

What the implementation looks like

What the skill saved over CLAUDE.md

Footguns

When skills are the right shape

When NOT to use a skill

Sources

Read more