If you have spent any time configuring an AI coding agent, you have probably figured out that rules and skills are different things. Rules are always loaded. Skills are invoked on demand. Rules handle recognition; skills handle procedure.
The interesting problems start after you have internalized that distinction and started building on it. I have been working with AI coding tools for over two years now, starting with Windsurf and building progressively more sophisticated systems with Claude Code. Recently I went into the Claude Code source code to understand how these mechanisms actually work at the implementation level. What I found changed how I think about the tradeoff.
Rules and skills do not just load at different times. They occupy different positions in the system, and the model treats them differently because of it.
The Failure Modes Tell You Everything
The basic distinction is useful, but it becomes powerful when you frame it through failure modes.
If you miss the moment to act, that is a rule problem. The rule was not in context when the trigger fired, so the agent did not recognize that something should happen. The moment passed silently. A rule that is not present when its trigger fires is a rule that does not exist.
If you miss a step in how to act, that is a skill problem. The agent recognized the situation but did not have the detailed procedure available. Skills contain the instructions for how to do something, and they only need to be present when that something is actively happening.
Two failure modes with two different tools to handle them. Every configuration decision becomes a question about which failure mode you are guarding against.
How Your Agent Actually Processes Input
Before I walk through the lifecycles, it helps to understand what is happening under the hood when your agent reads your configuration.
Every time Claude Code sends a request to the model, it sends several things, but two matter most for understanding where your configuration lives: a system prompt and a list of messages. These are architecturally separate inputs, and the model is trained to treat them differently.
The system prompt contains the behavioral instructions that Anthropic wrote for Claude Code: how to use tools, how to handle permissions, how to format output. This is the directive layer. You do not write this. It is the same for every Claude Code user.
You might expect your rules to live in that system prompt. They don't. The messages contain everything else: your conversation history, your questions, the tool results, and critically, your CLAUDE.md and rules content. Your rules are injected as the very first message in this conversation, wrapped in a special tag that signals "this is context, not a user question." Skills, when invoked, arrive later in the conversation as additional messages.
This means your rules are not system-level directives the way Anthropic's instructions are. They are the first thing the model reads in the conversation, which gives them a significant positional advantage, but they live in the same layer as everything else you say. Understanding this distinction matters for how you design your configuration.
Rules: The Stable Prefix
Rules load at session start and stay cached until compaction. Every rule file from .claude/rules/ and your CLAUDE.md content is read, combined, and injected as the first message in your conversation. This happens before you type anything.
That first-message position is important for three reasons.
Positional advantage. The model reads the entire context on every turn, but its attention is not uniform. Research has documented a U-shaped pattern: the beginning and end of the context get the strongest attention, while the middle gets the weakest. This is called the "lost in the middle" effect. Current models have been trained to mitigate it, so the effect is less dramatic than it was in 2023, but positional advantage is still real.
Anthropic's own long-context documentation recommends putting queries and instructions at the end of long contexts, after reference material. They are designing around the same attention dynamics.
Your rules sit at the very beginning of the conversation. Always. Your first message and the first assistant response also benefit from this beginning-of-context advantage. As the conversation grows, an invoked skill lands wherever you happen to be in the session, which in a long conversation means the middle: the weakest attention zone.
Prompt caching. Your rules do not change between turns. The model processed them on turn one, and since they are identical on turn two, the system can skip reprocessing them. This is prompt caching. It means rules are not just persistent; they are computationally cheap after the first turn. The same content arriving later in the conversation as a skill would not get this benefit, because the content before it may have changed.
Never compacted. When your conversation gets long enough, Claude Code compacts it: summarizing older messages to free up space. Your rules are never part of this compaction. They are rebuilt from disk rather than compressed with the conversation. Full fidelity, every time, regardless of how long the session runs.
The cost: every token in a rule pays rent on every turn. A 500-token rule costs 500 tokens of context on every API call. Over a 100-turn session, that single rule consumed 50,000 tokens of context. The cost is invisible, but real.
Path-Scoped Rules
There is a mechanism between always-loaded rules and on-demand skills that solves a specific problem neither can handle well.
Path-scoped rules use YAML frontmatter to specify which files they apply to. The first time you read a matching file, the rule content attaches to the file read result and enters the conversation at that point, the same position a skill would land. It costs zero tokens until that first match. If you never touch a matching file in a session, it never loads.
The trigger is mechanical. No model judgment needed, no 250-character description to interpret. You read a file, the system checks the glob, and matching rules attach. A new path-scoped rule file created mid-session is picked up on the next matching file read, no restart needed.
I use this for convention files that only matter when working with specific directories. A rule about project structure loads when I touch files in projects/. A rule about blog formatting loads when I start iterating on posts.
Skills: The Conversation Layer
Skills start small. At session startup, only a listing of available skills loads: each skill's name and a brief description, capped at 250 characters. The full SKILL.md content stays on disk. This listing is how the model knows what skills exist and when to invoke them.
When a skill is invoked, either by you typing /skill-name or by the model deciding it is relevant, the full content is read from disk and injected into the conversation as messages. Not as a system prompt. Not as first-position content. As conversation messages that arrive at the current point in the chat, interleaved with everything else.
This is the key architectural difference. Once invoked, a skill's content is in the conversation, and it stays there. It isn't removed after the turn. It is not temporary. But it is not in the same position as your rules either.
It arrives mid-conversation. A skill invoked on turn 5 sits between turn 4 and turn 6. As the conversation grows to turn 50, that skill content is deep in the middle of a long message history, competing with 45 turns of context for the model's attention. Your rules, by contrast, are still at the very beginning.
It is subject to compaction. When Claude Code compacts the conversation, invoked skills are preserved, but with limits. Each skill is capped at 5,000 tokens post-compaction, and the total budget across all invoked skills is 25,000 tokens. Skills are sorted by how recently they were invoked. If you have invoked more skills than the budget can hold, the oldest ones get dropped.
This means a rule and a skill with identical content are not equivalent, even after the skill is invoked. The rule persists at full fidelity in first position forever. The skill can be truncated or dropped under pressure.
The Split Pattern
Understanding the layer difference makes the split pattern more important.
The most common configuration problem I find is a single rule file trying to do both jobs: recognition and procedure. A few lines of recognition logic at the top ("when this situation arises, do this"), followed by fifty lines of procedural detail. The recognition needs to be always-loaded to catch the trigger. The procedure does not.
The fix: take the recognition concern and keep it as a rule. Three to five lines, always loaded, positionally advantaged. Take the procedural concern and move it to a skill that loads only when invoked. The recognition fires in the stable prefix. The procedure loads just in time into the conversation.
I applied this to five rules in my own configuration. The split saved roughly 120 lines of always-loaded context. Same behavior, dramatically less overhead. And now the procedure benefits from being loaded fresh at the point of use, rather than sitting in context for turns where it is irrelevant.
There is a reliability argument here too, not just a cost one. A skill's own description (capped at 250 characters, sitting somewhere in the conversation messages) is what the model reads when deciding whether to auto-invoke it. That description is competing with everything else in the conversation for attention. A three-line rule in the stable prefix will fire more reliably than a 250-character skill description hoping the model recognizes the situation on its own. If reliable triggering matters, the trigger belongs in a rule regardless of how good the skill description is. The rule catches the moment. The skill delivers the procedure. Each mechanism doing what it is best at.
A Decision Framework
When I add new behavior to my system, I ask four questions.
Does this need to be recognized before it is invoked? If yes, the trigger belongs in a rule. Keep it short. Just enough for pattern matching.
Does this require detailed procedural steps? If yes, those steps belong in a skill that loads on demand.
Is this tied to a specific file context? If yes, a path-scoped rule gives you deferred loading with mechanical reliability. No model judgment needed. Zero cost until you first touch a matching file.
Does this need to persist through long sessions at full fidelity? If yes, it belongs in a rule. Skills can be truncated after compaction in extended conversations.
The goal is the right information in context at the right time. Every token should be there because the current task needs it.
There Is a Third Piece
Rules handle recognition. Skills handle procedure. There is a third mechanism in Claude Code that handles neither. It handles the things that should happen automatically, without judgment, every single time. No recognition needed, no procedure to follow. Just a deterministic response to a specific event.
Think of it as the autonomic nervous system of your configuration. Your rules are conscious decisions. Your skills are learned procedures you invoke deliberately. Hooks are your reflexes, the responses that fire without you thinking about them, because if you had to think about them, you would eventually forget.
That deserves its own post. For now, if you find yourself writing a rule that says "every time X happens, always do Y," you are probably describing a hook, not a rule. The difference between "remember to do this" and "this just happens" is the difference between compliance and architecture.
If you are interested in the engineering principles behind structuring AI configuration at scale, I wrote about applying SOLID principles to this problem in a separate post. The context cost argument and the separation-of-concerns patterns are covered there in depth.



