Skills

A support agent handles billing 5% of the time and tech 80% of the time and refunds 15% of the time. Putting all three playbooks in the system prompt wastes tokens 95% of the time billing isn’t relevant. Putting them as tools the LLM activates on demand is what defineSkill does — the body + tools land only when the LLM asks for them.

What a Skill is

A Skill is the llm-activated flavor of the Injection primitive. It bundles:

A body — the playbook text the LLM follows when the skill is active
A set of tools — capabilities the LLM can call only after activating
A description — what the LLM sees BEFORE activating (used to decide whether to read the skill)

The library auto-attaches a read_skill tool to the agent. The LLM activates by calling read_skill('billing'); the framework looks up the skill, formats the body as the tool result, and lets the LLM use it on the next iteration.

Define and attach a skill

const billingSkill = defineSkill({
  id: 'billing',
  description: 'Read for refund / charge / billing questions. Unlocks process_refund.',
  body: 'When handling billing: confirm the order id, then call process_refund. Always state the amount + payment method in the final reply.',
  tools: [refundTool],
});

The body reads the way you’d brief a junior employee. The tools array holds capabilities only available once the skill is active — process_refund is locked away until the LLM has explicitly chosen to handle billing.

Why progressive disclosure matters

Three reasons:

Token cost — skill bodies are LARGE (200–2000 tokens of playbook text). Loading all of them on every turn is wasteful. Loading on demand is amortized.
Context discipline — the LLM sees ONLY the playbook for what it’s currently doing. No cross-domain confusion.
Capability scoping — process_refund exists in the agent’s tool registry only when billing is active. The LLM literally cannot call it during an unrelated turn — defense-in-depth against tool-call mistakes.

This is the pattern Anthropic shipped as “Agent SDK Skills” — agentfootprint reimplements it with cross-provider correctness via the Injection primitive so it works on mock(), OpenAI, Bedrock, Ollama identically.

When a skill activates

The LLM’s tool-call to read_skill({ id: 'billing' }) is the activation event. The framework:

Looks up billing in the agent’s registered skills
Formats the body as a tool result the LLM sees on the next iteration
Adds billing.tools to the agent’s tool registry for the rest of the turn
Emits agentfootprint.skill.activated for observability

The skill stays active until the agent run ends. The next agent.run() starts fresh.

Per-provider surface selection (`surfaceMode`)

import { defineSkill, resolveSurfaceMode } from 'agentfootprint';

const billingSkill = defineSkill({
  id: 'billing',
  description: 'Refund / charge / billing.',
  body: 'When handling billing: confirm the order id...',
  tools: [refundTool],
  surfaceMode: 'auto',  // resolves to 'both' on Claude ≥ 3.5; 'tool-only' elsewhere
});

// Pure inspector — see what 'auto' will resolve to in your stack:
resolveSurfaceMode('anthropic', 'claude-sonnet-4-5-20250929');  // → 'both'
resolveSurfaceMode('openai', 'gpt-4o');                         // → 'tool-only'

Four modes: 'system-prompt', 'tool-only', 'both', 'auto'. See Skills, explained for the full per-provider attention argument.

Per-mode runtime dispatch (v2.5+)

What each mode actually does at runtime when the LLM activates the skill via read_skill('id'):

`surfaceMode`	System slot (next iteration)	`read_skill` tool result
`'system-prompt'`	body lands here	confirmation only
`'tool-only'`	body SUPPRESSED	body delivered verbatim
`'both'`	body lands here	body delivered verbatim
`'auto'` (default)	body lands here	confirmation only

Two consequences worth knowing:

'tool-only' is recency-first by protocol. The LLM sees the body as the most recent tool result on the next iteration — providers’ attention to the latest message is consistently strong. No reliance on system-prompt training adherence.
'auto' preserves v2.4 behavior so existing consumers see no surprises. The Block A4 cascade resolves 'auto' against provider/model context (Claude ≥ 3.5 → 'both'; everything else → 'tool-only'); that runtime resolution lands in a future v2.5.x — for v2.5 today, 'auto' is treated as 'system-prompt' at the dispatch layer.

Long-context refresh (`refreshPolicy`)

defineSkill({
  id: 'critical-rule',
  description: 'Critical reasoning rule for long-context runs',
  body: 'When the value is ambiguous, ask for clarification before acting.',
  refreshPolicy: { afterTokens: 50_000, via: 'tool-result' },
});

Re-injects the body via tool result past a token threshold. The runtime hook lands in v2.5 (long-context attention work); the API surface ships in v2.4 + is non-breaking.

SkillRegistry — centralized governance

For shared skill catalogs across multiple agents:

import { Agent, SkillRegistry } from 'agentfootprint';

const registry = new SkillRegistry();
registry.register(billingSkill).register(refundSkill).register(complianceSkill);

const supportAgent = Agent.create({ provider }).skills(registry).build();
const escalationAgent = Agent.create({ provider }).skills(registry).build();

// Add a skill — every consumer Agent picks it up at next build.
registry.register(newSkill);

agent.skills(registry) is the bulk-register companion to .skill(t). Use the registry pattern when 2+ agents share overlapping skills; use .skill(...) directly when one agent has its own catalog.

SkillRegistry methods: register(skill) · replace(id, skill) · unregister(id) · get(id) · has(id) · list() · clear() · size · toTools() · resolveForSkill(skillOrId, provider?, model?). Throws on duplicate register (use replace for explicit overwrites). Throws on non-Skill flavor inputs.

Registry-level defaults — `new SkillRegistry({ surfaceMode, providerHint })` (v2.5)

When every skill in a registry should share the same surfaceMode, set it once on the constructor instead of repeating it on every defineSkill:

import { SkillRegistry } from 'agentfootprint';

// All skills here default to 'tool-only' (overrides defineSkill's 'auto')
const registry = new SkillRegistry({ surfaceMode: 'tool-only' });
registry.register(billingSkill);   // billingSkill.surfaceMode 'auto' → resolves to 'tool-only'
registry.register(refundSkill);

// providerHint helps when the registry is composed far from the agent
// (test fixtures, design-time inspectors, multi-provider routing).
const registry2 = new SkillRegistry({ providerHint: 'anthropic' });

The cascade for surfaceMode resolution is:

Per-skill explicit surfaceMode wins. defineSkill({ surfaceMode: 'both' }) is honored regardless of registry default.
Registry’s surfaceMode ctor opt (if set + not 'auto').
Global resolveSurfaceMode(provider, model) — Claude ≥ 3.5 → 'both', everything else → 'tool-only'.

Inspect the resolved mode for any registered skill:

registry.resolveForSkill('billing', 'anthropic', 'claude-sonnet-4-5');
// → returns 'system-prompt' | 'tool-only' | 'both' (never 'auto')

Forward-compat: Block C (v2.5+) wires this cascade into the runtime so per-mode routing diversity (suppressing system-prompt for 'tool-only', etc.) takes effect. Today consumers express intent; runtime tightens later without API change.

`registry.toTools()` — explicit composition (v2.5)

When you want to wire skill discovery into a custom tool chain (e.g., a gatedTools layer that filters by role) instead of the Agent’s auto-attached read_skill, use toTools():

import { SkillRegistry } from 'agentfootprint';
import { gatedTools, staticTools } from 'agentfootprint/tool-providers';
import { PermissionPolicy } from 'agentfootprint/security';

const registry = new SkillRegistry();
registry.register(billingSkill).register(refundSkill);

const { listSkills, readSkill } = registry.toTools();
// listSkills: Tool — no-arg discovery (LLM calls to enumerate skills)
// readSkill:  Tool — same as the auto-attached one (activation by id)

const policy = PermissionPolicy.fromRoles({...}, 'support');
const allTools = [listSkills!, readSkill!, lookupTool, refundTool];
const provider = gatedTools(staticTools(allTools), (n) => policy.isAllowed(n));

Two reasons to choose toTools() over the auto-attach:

Token-efficient discovery — the auto-attached read_skill embeds the catalog in its description (every iteration’s tool list pays the cost). list_skills lets the LLM browse on demand; read_skill’s description can stay terse. For ~20+ skill registries, this matters.
Permission gating — pass read_skill through gatedTools like any other tool, so a readonly role can see list_skills but not read_skill, or vice versa.

toTools() returns { listSkills: undefined, readSkill: undefined } for an empty registry — filter with .filter(Boolean) before adding to a tool list.

Per-skill tool gating — `autoActivate` (v2.5)

By default, a Skill’s tools array is ADDED to the agent’s tool registry on activation. With ~3 skills and ~5 tools each, that’s fine. With 20 skills and 100+ tools, the LLM’s choice space gets noisy — every iteration’s tool list pays the cost.

autoActivate: 'currentSkill' declares intent: when this skill is active, ONLY this skill’s tools should be visible (plus whatever baseline you compose alongside).

import { defineSkill } from 'agentfootprint';
import { skillScopedTools, staticTools, type ToolProvider } from 'agentfootprint/tool-providers';

const billingTools = [refundTool, chargeTool];

const billingSkill = defineSkill({
  id: 'billing',
  description: 'Billing assistance',
  body: '...',
  tools: billingTools,
  autoActivate: 'currentSkill',
});

// Materialize the gate manually today (Block C v2.5 wires this from
// skill.metadata.autoActivate automatically):
const baseline = staticTools([lookupOrderTool, listSkills, readSkill]);
const billingScope = skillScopedTools('billing', billingTools);
const refundScope = skillScopedTools('refund', [reverseTool]);

const provider: ToolProvider = {
  id: 'composite',
  list: (ctx) => [
    ...baseline.list(ctx),
    ...billingScope.list(ctx),
    ...refundScope.list(ctx),
  ],
};

What the LLM sees per iteration:

`ctx.activeSkillId`	Visible tools
`undefined` (no skill)	`lookup_order`, `list_skills`, `read_skill`
`'billing'`	baseline + `refund`, `charge`
`'refund'`	baseline + `reverse`

This is a Dynamic ReAct payoff: the next iteration’s tool list reshapes based on what just happened. 3× context-budget reduction in large catalogs + sharper LLM tool-choice.

Today: the autoActivate field is stored on skill.metadata.autoActivate; consumers wire skillScopedTools(...) manually. Block C (v2.5+): the runtime reads skill.metadata.autoActivate and wires the scope automatically when you agent.skills(registry).

When to use Skills vs Steering vs Instruction

You want	Use
Always-on persona / tone / format	`defineSteering`
Conditional rule (predicate-based)	`defineInstruction({ activeWhen })`
LLM-activated playbook + tools	`defineSkill` (this guide)
Cross-run state	`defineMemory`

Anti-patterns

Don’t put always-relevant content in a skill. If it’s relevant 100% of the time, it belongs in the system prompt or as Steering. Skills are for sometimes-relevant.
Don’t define dozens of tiny skills. The LLM picks by description; too many descriptions to scan = analysis paralysis. 3–10 focused skills is the sweet spot.
Don’t put sensitive credentials in a skill body. Skill bodies are LLM-readable plaintext; treat them as you would the system prompt.

Next steps

Skills, explained — the conceptual essay (why this design, cross-provider correctness, three-stage anatomy)
Tools guide — the underlying tool primitive Skills compose over