A few weeks ago I added a line to a project’s CLAUDE.md: “Always run pnpm format after editing TypeScript files.” I committed it, ran a session, watched the agent edit four files, and end the turn before formatting any of them. Not the agent’s fault. I had written an invariant into a layer that delivers wishes.
The fix was a four-line PostToolUse hook. The deeper question, the one this article is about, is which Claude Code extension to reach for in the first place. Capability, judgment, or invariant. Each has a layer that fits it and a layer that fails it.
The short answer
A tool gives the agent an action it can take: read a file, run a command, search the web, call an external service. A skill gives it reusable judgment and workflow knowledge that the model reads on activation. A hook enforces behavior at lifecycle events, deterministically, outside the model. MCP is how you add custom tools when the built-in set (Read, Grep, Bash, Edit, Write, WebSearch, Skill, Agent, and friends) is not enough.
The rule that organizes the rest of this article: capability in tools, judgment in skills, invariants in hooks.
Why this matters for subagent design
This is the follow-up the founding subagent article punted on. A subagent has a tool list. It can also load skills, run inside hooks the project committed, and have MCP servers attached. Without a layer model, those choices collapse into “give it everything that might help” and subagent tool lists become a junk drawer.
A concrete tell: if you keep adding bullets to the system prompt every time a reviewer misses something, you are putting enforcement in the wrong layer. Prompts and skills depend on the model reading them and choosing to comply. Hooks fire whether the model wants them to or not.
First split: capability, judgment, or enforcement
The first decision is not which extension to pick. It is which kind of thing the behavior is.
| You need… | Layer | Failure if mismatched |
|---|---|---|
| An action the agent does not have | Tool (built-in or MCP) | Agent fabricates output it cannot verify |
| A reusable method the agent should follow | Skill | Inconsistent application across sessions |
| A rule that must hold every time | Hook | Agent skips it under pressure or fresh state |
Tools answer “what can it do.” Skills answer “how should it do it.” Hooks answer “what must always happen around it.”
The four harness controls (context, model, prompt, tools) tell you which knobs exist per agent. This article tells you which knob to reach for given the behavior you are trying to encode.
Use an MCP tool when Claude needs a live handle
MCP is the right layer when Claude needs a connection to an external system: GitHub issues, Linear tickets, Slack messages, a database, an observability dashboard, a browser. The protocol exposes resources, prompts, and actions through a server the agent talks to over a channel.
Heuristics:
- Stateful or interactive belongs in MCP. If the agent is going to query something, watch for changes, or post back, you want a server, not a script.
- Auth or rate-limited belongs in MCP. The server can manage credentials, retries, and quotas in one place; bundling a script per project means duplicating that surface.
- Read-once and small can stay in a shell command. A
gh issue viewinvocation inside a skill is fine; standing up an MCP server for one issue lookup is overkill.
Watch the costs MCP introduces: tool name discovery at session start (cheap), schema fetch on first use (deferred but not free), and a process to keep alive. The Claude Code MCP page covers scope, plugin-provided servers, and authentication.
Use a skill when the hard part is knowing how
A skill is a folder with a SKILL.md file plus optional scripts, references, and assets. Activation is model-mediated: Claude reads every skill’s name and description at startup and pulls the full content into context only when the description matches the current task. The Agent Skills spec calls this progressive disclosure.
Two things make this layer load-bearing:
- The description is a router, not a label. “Security expert” undertriggers because nothing in it names a triggering situation. “Reviews code for auth and session-handling risks before commits” routes because it names the trigger and the scope. Test should-trigger and should-not-trigger prompts before shipping.
- A skill is reusable judgment, not a saved prompt. Bundle the rubric, the references, the helper scripts you want the model to call. Methodology you would otherwise paste into every conversation belongs here.
Use a skill when the hard part is the method, not the action. A code-review checklist, a release-notes shape, a debugging routine, a research workflow. The model still chooses when to apply it; you are giving it a better starting point, not a guardrail.
Use a hook when the rule must survive the model
A hook fires on a lifecycle event (PreToolUse, PostToolUse, Stop, SessionStart, others) and runs a handler: a shell command, an HTTP request, an MCP tool call, a prompt, or another agent. Hooks are deterministic. They run whether the model planned for them or not.
Use them for things that must happen every time:
- Block unsafe Bash. A
PreToolUsematcher onBashcan rejectrm -rf /,git push --forceonmain, or any command touching a path outside the workspace. - Format on write. A
PostToolUsematcher onEdit|Writeruns Prettier, gofmt, or your linter and reports back if the file is dirty. - Audit tool use. Log every
mcp__github__create_issuecall to a JSONL file the team can review later. The samemcp__<server>__<tool>pattern lets a hook also call MCP tools through amcp_toolhandler. - Reload config. A
SessionStarthook can refresh environment variables or re-pull a shared rulebook.
Hooks are powerful because they are outside the model. That is also their risk surface. Keep matchers narrow, commit only team-safe project hooks (.claude/settings.json), keep personal automation in .claude/settings.local.json, and never run unaudited network code from a hook handler.
What belongs in CLAUDE.md instead
CLAUDE.md is loaded every session. That makes it the most expensive context surface you have. Put things there only when every session needs them: build commands, repository conventions, hard working agreements, the architectural sketch a new contributor would want on day one.
Move occasional reference material out of CLAUDE.md and into a skill. A test: if the content only matters for one task type or one role, it belongs in a skill description, not the project’s always-on context. CLAUDE.md as a dumping ground is the most common context-cost mistake I see in production repos.
The context-cost table builders should actually use
| Surface | Loaded at session start | Loaded on use | Notes |
|---|---|---|---|
CLAUDE.md | Full content | n/a | Every session pays this. Keep it tight. |
| Skills | Names and descriptions only | Full SKILL.md plus referenced files | Cheap to register, expensive only when active. |
| MCP tools | Tool names | Schema and result | Defer schema until first call; budget for size. |
| Hooks | Nothing (unless they speak) | Handler output if returned to context | A silent hook is free; a chatty hook is not. |
| Subagents | Subagent definitions | Isolated per call | The parent only sees the final message. |
The practical implication: prefer skills over CLAUDE.md additions for anything that is not always relevant. Prefer MCP over inline scripts when the script would otherwise be duplicated across projects. Prefer hooks over prompt rules for any invariant you cannot afford to skip.
Failure modes when you pick the wrong layer
Four cases that come up often:
- Skill used as enforcement. “Always run the security review before commits” sits in a skill’s instructions. Claude follows it most of the time. Then it forgets once, and the team learns about it from a postmortem. Move the trigger to a
StoporPreToolUsehook. - Hook used for judgment. A
PostToolUsehandler tries to “review” the diff. Hooks run shell handlers, not models; the review either short-circuits to a regex or escalates back to the model and you have built a worse skill. Keep judgment in skills. - MCP server for a static checklist. Someone wraps a checklist in an MCP server because that is the layer they know. The server returns the same string every time. Make it a skill, ship the markdown, save the process.
- CLAUDE.md as a dumping ground. Every session pays for content most sessions do not need. Move per-task content to skills.
The named version of these is tool-selection errors: vague descriptions, missing tools, too many tools, ambiguous tool surfaces. They are layer-misplacement errors at root.
A concrete subagent tool-list recipe
Take a code-reviewer subagent. Four extension layers, four responsibilities:
# code-reviewer subagent
description: "Reviews changes for correctness and security before commits. Use proactively on feature branches."
tools: [Read, Grep, Glob] # capability: read-only navigation
skills: [security-review] # judgment: rubric and references
mcpServers: [github] # only when PR metadata is needed
model: claude-opus-4-7
Hooks live at the project level, not the subagent level. In .claude/settings.json:
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{ "type": "command", "command": ".claude/hooks/block-unsafe-bash.sh" }
]
}
],
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [{ "type": "command", "command": ".claude/hooks/format.sh" }]
}
]
}
}
Each layer carries one responsibility. Tools say what the reviewer can touch. The skill says how to review. The hooks say what the session as a whole must guarantee. The model picks the cost tier. Strip any of those four out and the failure has a clear shape: missing capability, missing method, missing guardrail, or wrong cost.
When the same kit (tools, skills, hooks, MCP servers) is needed across many repos, package it as a Claude Code plugin. Plugins exist for exactly this distribution problem; copying .claude/ directories by hand is the failure mode they prevent.
How to test the setup
Three checks before you ship a new layer:
- Skill triggers. Write five should-trigger and five should-not-trigger prompts. Run them. The skill should fire on the first set and stay quiet on the second. False-trigger means the description is too broad. Undertrigger means it does not name the situation.
- Hook smoke tests. Feed the matcher representative tool calls and confirm the handler runs and returns the expected exit code. A hook that fails silently is worse than no hook.
- MCP health. Run
/mcpto confirm the server is reachable, the tool list is what you expect, and the schemas are sized for your context budget.
For the subagent itself, run a known fixture: a repo state where you already know what a correct review looks like. If the subagent misses what the fixture flags, the gap is in one of the four layers above and you can locate which.
Frequently asked questions
Q1. What is the difference between a skill and a tool in Claude Code? A tool is an action surface Claude can call: read a file, run Bash, fetch from a web service, call an MCP server. A skill is reusable knowledge or workflow guidance the model reads and applies through the existing Skill tool. Use tools for capability. Use skills for judgment, process, and domain method.
Q2. When should I use a hook instead of a skill? Use a hook when the behavior must happen every time a matching event occurs. Formatting after edits, blocking unsafe commands, logging tool use, and validating protected paths belong in hooks. Use a skill when Claude needs to reason through a workflow and adapt the method to the task at hand.
Q3. When should I use MCP instead of a script bundled in a skill? Use MCP when Claude needs a reusable live connection to an external system: GitHub, Linear, Slack, a database, a browser, an observability service. Bundle a script in a skill when the script is part of one repeatable workflow and does not need to behave like a general interactive service.
Q4. Do subagents automatically inherit skills from the main session? No. Subagents run in isolated context and do not inherit skills already invoked in the parent conversation. If a subagent needs a methodology, pass that skill explicitly in the subagent definition. Otherwise it may have the right tools but miss the procedure those tools are supposed to serve.
Q5. What belongs in CLAUDE.md instead of a skill? Put always-needed project facts in CLAUDE.md: build commands, repository conventions, core architecture notes, hard working agreements. Move occasional reference material or repeatable workflows into skills. Test: if Claude needs the content in every session, use CLAUDE.md. If it only matters for one task type, use a skill.
Q6. How do I know whether a skill description is good enough? Test it. Write should-trigger and should-not-trigger prompts, run them several times, and check whether the skill activates at the right rate. Good descriptions name user intent, output shape, and trigger phrases. “Security expert” is a label. “Reviews code for auth and session risks before commits” is a router.
Q7. Are hooks risky?
Yes, because their value comes from deterministic execution outside the model. Keep matchers narrow, commit only team-safe project hooks, keep personal automation in local settings, and prefer handlers that are easy to audit. A hook that blocks rm -rf is a guardrail. A hook that runs opaque network code is supply-chain surface.
Where to go from here
Two siblings already live:
- The founding harness pillar, for the four-control framing this article specializes.
- Subagent design, for the per-role choices this article extends.
And the cluster ahead:
- Model choice per subagent (coming soon). Picking Opus, Sonnet, or Haiku per role, with the latency and cost tradeoffs made explicit.
- Evals that catch subagent silent failures (coming soon). Writing evals for the silent-failure mode the subagent article flagged, before they ship.