Why agentfootprint?
Your support agent told a customer their refund was processed. Six weeks later they come back and ask “why did you tell me my refund was processed when it wasn’t?” You go to look. The agent is gone. Logs are scattered across three services. The decision evidence is not there. This is the gap agentfootprint exists to close.
The new class of bug
Section titled “The new class of bug”For fifty years, software bugs have been logic errors. A wrong condition, a missed edge case, an off-by-one. You step through the code with a debugger until you find the bad branch.
LLM-powered apps add a second class: contextual errors. The code is correct. The model is correct. The answer is wrong because the LLM’s decision rests on context that was ambiguous, missing, or invalidated at the moment of inference.
Tracking which content the model actually saw, and why, is the entire debugging job. Without it, the failure mode is invisible:
| What got injected wrong | What the model did |
|---|---|
Wrong instruction landed in the system slot | Followed the wrong rule |
| Predicate fired one iteration too early | Reasoned with stale assumptions |
Skill body missing when LLM called read_skill | Invented its own |
| Cache prefix invalidated mid-iteration | Saw a silently rewritten stale version |
Tool returned but on-tool-return injection didn’t fire | Couldn’t interpret the result |
A framework that owns the control flow can debug logic errors. A framework that owns the injection can debug contextual errors — because every injection is a typed event with a where, when, why, and how-it-cached.
The four backtrack questions
Section titled “The four backtrack questions”Every agentfootprint LLM call backtracks to four typed answers — and they’re the answers the wrong-answer customer is asking, ten weeks late:
| Question | What the trace tells you |
|---|---|
| What was injected? | Every flavor of content the LLM saw on this iteration (Skill bodies, RAG passages, Steering rules, tool schemas). |
| Who triggered it? | Which rule fired (always / rule predicate / on-tool-return / llm-activated read_skill). |
| When it fired? | Which iteration of the ReAct loop, after which event (which tool returned, which skill activated). |
| How it landed? | Which slot (system / messages / tools), what position, what cache strategy, whether it was actually applied or skipped. |
These aren’t logged alongside the call. They’re the structural output of the call — produced by the framework owning the runtime loop.
Connected evidence — built declaratively
Section titled “Connected evidence — built declaratively”agentfootprint gives you connected evidence — grounded, auditable, LLM-readable. Every iteration of the ReAct loop, every tool call with its args + result, every context injection, every decision branch is captured as a typed event during one DFS traversal — no instrumentation, no post-processing.
The agent + tool registration is built declaratively. The framework owns the loop, so it can record everything that happens inside it:
const agent = Agent.create({ provider: provider ?? exampleProvider('feature', { respond: weatherRespond }), model: 'mock', maxIterations: 5,}) .system('You answer weather questions using the `weather` tool.') .tool({ schema: { name: 'weather', description: 'Get current weather for a city.', inputSchema: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'], }, }, execute: async (args) => `${(args as { city: string }).city}: sunny, 72°F`, }) .build();Observability is just attaching listeners — no SDK, no agent.observe wrapper, no separate tracing pipeline:
agent.on('agentfootprint.stream.tool_start', (e) => console.log(`→ tool ${e.payload.toolName}(${JSON.stringify(e.payload.args)})`),);agent.on('agentfootprint.stream.tool_end', (e) => console.log(`← tool result: ${e.payload.result}`),);Every event is typed. Every payload is what you’d want — toolName + args on tool_start, result on tool_end, iterationCount + token totals on turn_end. No string-parsing logs. No “I think this means the tool was called.”
Three workflows from one trace
Section titled “Three workflows from one trace”The same trace serves three workflows:
| Mode | What you do | What the trace gives you |
|---|---|---|
| Live | Debug as you build | Exactly which injection produced which token; which predicate fired this iteration; which prefix actually got cached |
| Offline | Monitor what shipped | Replay any past run from its trace. Alert on drift. Attribute cost per injection. |
| Detailed | Improve via export | Every successful trajectory is labeled training data for SFT, DPO, or process-RL — no separate data-collection phase |
And a fourth, novel: the agent can read its own trace. Six months after the agent rejected loan #42, “why did you reject it?” answers from the recorded evidence (creditScore=580, threshold=600), not a rerun. Causal memory turns the trace into the agent’s working memory.
Dynamic ReAct — context recomposes every iteration
Section titled “Dynamic ReAct — context recomposes every iteration”The corollary that makes “context engineering” worth the name. Other frameworks assemble the prompt once per turn; agentfootprint re-runs every Injection trigger every iteration. Tool schemas, system prompts, skill bodies, memory recall — all recompute against the freshest state.
| Iteration | Classic ReAct | Dynamic ReAct (agentfootprint) |
|---|---|---|
| 1 | 12 tools shown | 1 tool (read_skill) |
| 2 | 12 tools shown | 5 tools (skill activated) |
| 3 | 12 tools shown | 5 tools |
Per-iteration recomposition is also the structural prerequisite for the cache layer — cache markers can’t track active injections in lockstep without it.
📖 Dynamic ReAct guide for the full taxonomy of what this unlocks (tool-by-tool steering, adaptive tool exposure, cost guardrails, iterative format refinement, failure adaptation).
$0 test runs — scripted ReAct
Section titled “$0 test runs — scripted ReAct”Tool-using agents are notoriously hard to test because the LLM’s behavior is non-deterministic. mock({ replies }) solves that — script the LLM’s decisions turn-by-turn, run the agent against the script, get a deterministic, free, instant test:
mock({ replies: [ { toolCalls: [ { id: 'call-1', name: 'lookup', args: { topic: 'refunds' } as Record<string, unknown>, }, ], }, { content: 'Refunds take 3 business days.' }, ],});Iteration 1 calls the tool with the scripted args; iteration 2 returns the final answer. The tool actually executes; the agent loop is real. Only the LLM is mocked. Swap mock(...) for anthropic(...) to ship — the rest of the agent is identical.
No graph DSL required
Section titled “No graph DSL required”Composition is just chained .create() builders, not a separate graph language. Two agents in series? Sequence. Multiple critics with merge? Parallel. Iterate until quality bar? Loop. The compositions are how multi-agent systems are built — there’s no MultiAgentSystem class, no Orchestrator to learn:
const runner = reflection({ provider: provider ?? exampleProvider('pattern', { respond: () => replies[i++ % replies.length]! }), model: 'mock', proposerPrompt: 'Write or revise a short poem about night.', criticPrompt: 'Critique the poem. When it is good enough include the marker DONE.', maxIterations: 5,});That’s Reflexion (Shinn 2023), expressed as Sequence(Agent, critique-LLM, Agent). Tree-of-Thoughts is Parallel(Agent × N) + LLM-rank. Debate is Loop(Agent × 2 + judge). Every named pattern in the agent literature is a composition of the same 2 primitives + 4 compositions. You learn the substrate; the field’s growth lands as recipes on top.
Built on footprintjs
Section titled “Built on footprintjs”agentfootprint is the agent layer. footprintjs is the substrate — the flowchart-pattern execution engine that makes our typed-event stream + replayable traces automatic. footprintjs gives us composition primitives, state-machine semantics, durable workflow checkpoints, and 57+ typed events out of the box; we used the budget those abstractions would have cost us to invest deeply in the injection loop — the layer every other framework leaves to the developer.
You don’t need to learn footprintjs to use agentfootprint — but if you want to build your own primitives at this depth, start there.
Next steps
Section titled “Next steps”- Quick Start — build your first agent
- Key Concepts — the 5-layer taxonomy
- Reliability gate — rules-based retry / fallback / fail-fast around every LLM call
- Causal memory deep dive — how the trace becomes the agent’s working memory