Skip to content

Why agentfootprint?

Your support agent told a customer their refund was processed. Six weeks later they come back and ask “why did you tell me my refund was processed when it wasn’t?” You go to look. The agent is gone. Logs are scattered across three services. The decision evidence is not there. This is the gap agentfootprint exists to close.

For fifty years, software bugs have been logic errors. A wrong condition, a missed edge case, an off-by-one. You step through the code with a debugger until you find the bad branch.

LLM-powered apps add a second class: contextual errors. The code is correct. The model is correct. The answer is wrong because the LLM’s decision rests on context that was ambiguous, missing, or invalidated at the moment of inference.

Tracking which content the model actually saw, and why, is the entire debugging job. Without it, the failure mode is invisible:

What got injected wrongWhat the model did
Wrong instruction landed in the system slotFollowed the wrong rule
Predicate fired one iteration too earlyReasoned with stale assumptions
Skill body missing when LLM called read_skillInvented its own
Cache prefix invalidated mid-iterationSaw a silently rewritten stale version
Tool returned but on-tool-return injection didn’t fireCouldn’t interpret the result

A framework that owns the control flow can debug logic errors. A framework that owns the injection can debug contextual errors — because every injection is a typed event with a where, when, why, and how-it-cached.

Every LLM call has 3 fixed slots (system, messages, tools); every flavor lands in one slot under one of 4 fixed triggers. The grid is the entire context-engineering surface.

Every agentfootprint LLM call backtracks to four typed answers — and they’re the answers the wrong-answer customer is asking, ten weeks late:

QuestionWhat the trace tells you
What was injected?Every flavor of content the LLM saw on this iteration (Skill bodies, RAG passages, Steering rules, tool schemas).
Who triggered it?Which rule fired (always / rule predicate / on-tool-return / llm-activated read_skill).
When it fired?Which iteration of the ReAct loop, after which event (which tool returned, which skill activated).
How it landed?Which slot (system / messages / tools), what position, what cache strategy, whether it was actually applied or skipped.

These aren’t logged alongside the call. They’re the structural output of the call — produced by the framework owning the runtime loop.

Connected evidence — built declaratively

Section titled “Connected evidence — built declaratively”

agentfootprint gives you connected evidence — grounded, auditable, LLM-readable. Every iteration of the ReAct loop, every tool call with its args + result, every context injection, every decision branch is captured as a typed event during one DFS traversal — no instrumentation, no post-processing.

The agent + tool registration is built declaratively. The framework owns the loop, so it can record everything that happens inside it:

examples/core/02-agent-with-tools.ts (region: build)
const agent = Agent.create({
provider: provider ?? exampleProvider('feature', { respond: weatherRespond }),
model: 'mock',
maxIterations: 5,
})
.system('You answer weather questions using the `weather` tool.')
.tool({
schema: {
name: 'weather',
description: 'Get current weather for a city.',
inputSchema: {
type: 'object',
properties: { city: { type: 'string' } },
required: ['city'],
},
},
execute: async (args) => `${(args as { city: string }).city}: sunny, 72°F`,
})
.build();

Observability is just attaching listeners — no SDK, no agent.observe wrapper, no separate tracing pipeline:

examples/core/02-agent-with-tools.ts (region: observe)
agent.on('agentfootprint.stream.tool_start', (e) =>
console.log(`→ tool ${e.payload.toolName}(${JSON.stringify(e.payload.args)})`),
);
agent.on('agentfootprint.stream.tool_end', (e) =>
console.log(`← tool result: ${e.payload.result}`),
);

Every event is typed. Every payload is what you’d want — toolName + args on tool_start, result on tool_end, iterationCount + token totals on turn_end. No string-parsing logs. No “I think this means the tool was called.”

The same trace serves three workflows:

ModeWhat you doWhat the trace gives you
LiveDebug as you buildExactly which injection produced which token; which predicate fired this iteration; which prefix actually got cached
OfflineMonitor what shippedReplay any past run from its trace. Alert on drift. Attribute cost per injection.
DetailedImprove via exportEvery successful trajectory is labeled training data for SFT, DPO, or process-RL — no separate data-collection phase

And a fourth, novel: the agent can read its own trace. Six months after the agent rejected loan #42, “why did you reject it?” answers from the recorded evidence (creditScore=580, threshold=600), not a rerun. Causal memory turns the trace into the agent’s working memory.

One agent run produces a JSON-portable causal trace. Three downstream consumers fan out: audit replay, cheap-model triage, training data export.

Dynamic ReAct — context recomposes every iteration

Section titled “Dynamic ReAct — context recomposes every iteration”

The corollary that makes “context engineering” worth the name. Other frameworks assemble the prompt once per turn; agentfootprint re-runs every Injection trigger every iteration. Tool schemas, system prompts, skill bodies, memory recall — all recompute against the freshest state.

Classic ReAct loops back to CallLLM (slots frozen). Dynamic ReAct loops back to SystemPrompt (slots recompose every iteration).

IterationClassic ReActDynamic ReAct (agentfootprint)
112 tools shown1 tool (read_skill)
212 tools shown5 tools (skill activated)
312 tools shown5 tools

Per-iteration recomposition is also the structural prerequisite for the cache layer — cache markers can’t track active injections in lockstep without it.

📖 Dynamic ReAct guide for the full taxonomy of what this unlocks (tool-by-tool steering, adaptive tool exposure, cost guardrails, iterative format refinement, failure adaptation).

Tool-using agents are notoriously hard to test because the LLM’s behavior is non-deterministic. mock({ replies }) solves that — script the LLM’s decisions turn-by-turn, run the agent against the script, get a deterministic, free, instant test:

examples/features/07-mock-multi-turn-replies.ts (region: scripted-replies)
mock({
replies: [
{
toolCalls: [
{
id: 'call-1',
name: 'lookup',
args: { topic: 'refunds' } as Record<string, unknown>,
},
],
},
{ content: 'Refunds take 3 business days.' },
],
});

Iteration 1 calls the tool with the scripted args; iteration 2 returns the final answer. The tool actually executes; the agent loop is real. Only the LLM is mocked. Swap mock(...) for anthropic(...) to ship — the rest of the agent is identical.

Composition is just chained .create() builders, not a separate graph language. Two agents in series? Sequence. Multiple critics with merge? Parallel. Iterate until quality bar? Loop. The compositions are how multi-agent systems are built — there’s no MultiAgentSystem class, no Orchestrator to learn:

examples/patterns/02-reflection.ts (region: reflexion-recipe — Reflexion as a 30-line recipe)
const runner = reflection({
provider: provider ?? exampleProvider('pattern', { respond: () => replies[i++ % replies.length]! }),
model: 'mock',
proposerPrompt: 'Write or revise a short poem about night.',
criticPrompt:
'Critique the poem. When it is good enough include the marker DONE.',
maxIterations: 5,
});

That’s Reflexion (Shinn 2023), expressed as Sequence(Agent, critique-LLM, Agent). Tree-of-Thoughts is Parallel(Agent × N) + LLM-rank. Debate is Loop(Agent × 2 + judge). Every named pattern in the agent literature is a composition of the same 2 primitives + 4 compositions. You learn the substrate; the field’s growth lands as recipes on top.

agentfootprint is the agent layer. footprintjs is the substrate — the flowchart-pattern execution engine that makes our typed-event stream + replayable traces automatic. footprintjs gives us composition primitives, state-machine semantics, durable workflow checkpoints, and 57+ typed events out of the box; we used the budget those abstractions would have cost us to invest deeply in the injection loop — the layer every other framework leaves to the developer.

You don’t need to learn footprintjs to use agentfootprint — but if you want to build your own primitives at this depth, start there.