Skip to content
The Harness
Go back

Claude Code skills, tools, and hooks: when to use each

A practical rule for Claude Code extensions: tools give capability, skills give judgment, and hooks enforce invariants.

@theharnessio

A few weeks ago I added a line to a project’s CLAUDE.md: “Always run pnpm format after editing TypeScript files.” I committed it, ran a session, watched the agent edit four files, and end the turn before formatting any of them. Not the agent’s fault. I had written an invariant into a layer that delivers wishes.

The fix was a four-line PostToolUse hook. The deeper question, the one this article is about, is which Claude Code extension to reach for in the first place. Capability, judgment, or invariant. Each has a layer that fits it and a layer that fails it.

The short answer

A tool gives the agent an action it can take: read a file, run a command, search the web, call an external service. A skill gives it reusable judgment and workflow knowledge that the model reads on activation. A hook enforces behavior at lifecycle events, deterministically, outside the model. MCP is how you add custom tools when the built-in set (Read, Grep, Bash, Edit, Write, WebSearch, Skill, Agent, and friends) is not enough.

The rule that organizes the rest of this article: capability in tools, judgment in skills, invariants in hooks.

Why this matters for subagent design

This is the follow-up the founding subagent article punted on. A subagent has a tool list. It can also load skills, run inside hooks the project committed, and have MCP servers attached. Without a layer model, those choices collapse into “give it everything that might help” and subagent tool lists become a junk drawer.

A concrete tell: if you keep adding bullets to the system prompt every time a reviewer misses something, you are putting enforcement in the wrong layer. Prompts and skills depend on the model reading them and choosing to comply. Hooks fire whether the model wants them to or not.

First split: capability, judgment, or enforcement

The first decision is not which extension to pick. It is which kind of thing the behavior is.

You need…LayerFailure if mismatched
An action the agent does not haveTool (built-in or MCP)Agent fabricates output it cannot verify
A reusable method the agent should followSkillInconsistent application across sessions
A rule that must hold every timeHookAgent skips it under pressure or fresh state

Tools answer “what can it do.” Skills answer “how should it do it.” Hooks answer “what must always happen around it.”

The four harness controls (context, model, prompt, tools) tell you which knobs exist per agent. This article tells you which knob to reach for given the behavior you are trying to encode.

Use an MCP tool when Claude needs a live handle

MCP is the right layer when Claude needs a connection to an external system: GitHub issues, Linear tickets, Slack messages, a database, an observability dashboard, a browser. The protocol exposes resources, prompts, and actions through a server the agent talks to over a channel.

Heuristics:

Watch the costs MCP introduces: tool name discovery at session start (cheap), schema fetch on first use (deferred but not free), and a process to keep alive. The Claude Code MCP page covers scope, plugin-provided servers, and authentication.

Use a skill when the hard part is knowing how

A skill is a folder with a SKILL.md file plus optional scripts, references, and assets. Activation is model-mediated: Claude reads every skill’s name and description at startup and pulls the full content into context only when the description matches the current task. The Agent Skills spec calls this progressive disclosure.

Two things make this layer load-bearing:

Use a skill when the hard part is the method, not the action. A code-review checklist, a release-notes shape, a debugging routine, a research workflow. The model still chooses when to apply it; you are giving it a better starting point, not a guardrail.

Use a hook when the rule must survive the model

A hook fires on a lifecycle event (PreToolUse, PostToolUse, Stop, SessionStart, others) and runs a handler: a shell command, an HTTP request, an MCP tool call, a prompt, or another agent. Hooks are deterministic. They run whether the model planned for them or not.

Use them for things that must happen every time:

Hooks are powerful because they are outside the model. That is also their risk surface. Keep matchers narrow, commit only team-safe project hooks (.claude/settings.json), keep personal automation in .claude/settings.local.json, and never run unaudited network code from a hook handler.

What belongs in CLAUDE.md instead

CLAUDE.md is loaded every session. That makes it the most expensive context surface you have. Put things there only when every session needs them: build commands, repository conventions, hard working agreements, the architectural sketch a new contributor would want on day one.

Move occasional reference material out of CLAUDE.md and into a skill. A test: if the content only matters for one task type or one role, it belongs in a skill description, not the project’s always-on context. CLAUDE.md as a dumping ground is the most common context-cost mistake I see in production repos.

The context-cost table builders should actually use

SurfaceLoaded at session startLoaded on useNotes
CLAUDE.mdFull contentn/aEvery session pays this. Keep it tight.
SkillsNames and descriptions onlyFull SKILL.md plus referenced filesCheap to register, expensive only when active.
MCP toolsTool namesSchema and resultDefer schema until first call; budget for size.
HooksNothing (unless they speak)Handler output if returned to contextA silent hook is free; a chatty hook is not.
SubagentsSubagent definitionsIsolated per callThe parent only sees the final message.

The practical implication: prefer skills over CLAUDE.md additions for anything that is not always relevant. Prefer MCP over inline scripts when the script would otherwise be duplicated across projects. Prefer hooks over prompt rules for any invariant you cannot afford to skip.

Failure modes when you pick the wrong layer

Four cases that come up often:

The named version of these is tool-selection errors: vague descriptions, missing tools, too many tools, ambiguous tool surfaces. They are layer-misplacement errors at root.

A concrete subagent tool-list recipe

Take a code-reviewer subagent. Four extension layers, four responsibilities:

# code-reviewer subagent
description: "Reviews changes for correctness and security before commits. Use proactively on feature branches."
tools: [Read, Grep, Glob] # capability: read-only navigation
skills: [security-review] # judgment: rubric and references
mcpServers: [github] # only when PR metadata is needed
model: claude-opus-4-7

Hooks live at the project level, not the subagent level. In .claude/settings.json:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          { "type": "command", "command": ".claude/hooks/block-unsafe-bash.sh" }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [{ "type": "command", "command": ".claude/hooks/format.sh" }]
      }
    ]
  }
}

Each layer carries one responsibility. Tools say what the reviewer can touch. The skill says how to review. The hooks say what the session as a whole must guarantee. The model picks the cost tier. Strip any of those four out and the failure has a clear shape: missing capability, missing method, missing guardrail, or wrong cost.

When the same kit (tools, skills, hooks, MCP servers) is needed across many repos, package it as a Claude Code plugin. Plugins exist for exactly this distribution problem; copying .claude/ directories by hand is the failure mode they prevent.

How to test the setup

Three checks before you ship a new layer:

For the subagent itself, run a known fixture: a repo state where you already know what a correct review looks like. If the subagent misses what the fixture flags, the gap is in one of the four layers above and you can locate which.

Frequently asked questions

Q1. What is the difference between a skill and a tool in Claude Code? A tool is an action surface Claude can call: read a file, run Bash, fetch from a web service, call an MCP server. A skill is reusable knowledge or workflow guidance the model reads and applies through the existing Skill tool. Use tools for capability. Use skills for judgment, process, and domain method.

Q2. When should I use a hook instead of a skill? Use a hook when the behavior must happen every time a matching event occurs. Formatting after edits, blocking unsafe commands, logging tool use, and validating protected paths belong in hooks. Use a skill when Claude needs to reason through a workflow and adapt the method to the task at hand.

Q3. When should I use MCP instead of a script bundled in a skill? Use MCP when Claude needs a reusable live connection to an external system: GitHub, Linear, Slack, a database, a browser, an observability service. Bundle a script in a skill when the script is part of one repeatable workflow and does not need to behave like a general interactive service.

Q4. Do subagents automatically inherit skills from the main session? No. Subagents run in isolated context and do not inherit skills already invoked in the parent conversation. If a subagent needs a methodology, pass that skill explicitly in the subagent definition. Otherwise it may have the right tools but miss the procedure those tools are supposed to serve.

Q5. What belongs in CLAUDE.md instead of a skill? Put always-needed project facts in CLAUDE.md: build commands, repository conventions, core architecture notes, hard working agreements. Move occasional reference material or repeatable workflows into skills. Test: if Claude needs the content in every session, use CLAUDE.md. If it only matters for one task type, use a skill.

Q6. How do I know whether a skill description is good enough? Test it. Write should-trigger and should-not-trigger prompts, run them several times, and check whether the skill activates at the right rate. Good descriptions name user intent, output shape, and trigger phrases. “Security expert” is a label. “Reviews code for auth and session risks before commits” is a router.

Q7. Are hooks risky? Yes, because their value comes from deterministic execution outside the model. Keep matchers narrow, commit only team-safe project hooks, keep personal automation in local settings, and prefer handlers that are easy to audit. A hook that blocks rm -rf is a guardrail. A hook that runs opaque network code is supply-chain surface.

Where to go from here

Two siblings already live:

And the cluster ahead:


Share this post on:

Next Post
How to design subagents that help rather than thrash