agentfootprint mascot — it pulls scattered context in and hands back clean, traceable slots

open source · MIT · mock-first

Find the context that made your agent answer wrong.

Debug why your AI agent gave the wrong answer — and prove the fix by re-running without the cause.

Why is a query, not a guess.

Get started → Star on GitHub

refunds-agentrecording

01classify→refund

02retrieve→policy doc

03check→continue

04decide→approved ✗

0 steps · 0 tok · 0 ms↶ why?

Inject less. Trace more.

Faster debuggingtrace any answer to its exact cause

Provable causeproven by replay, not guessed

Lower token costcontext shrinks to what the step needs

Don't take the claim on faith. Scroll the story — a wrong answer traced to its cause, the context that built it, and the engine that recorded it all.

01It answered wrongAsking the model why only gets you a confident guess.

It approved a refund it should have denied.

Somewhere in the context you fed it, one piece flipped the decision. Which one?

The run

classify → refund · check → continue · decide → approved ✗

Can't you just ask a model?

gptthe customer history98%

claudethe policy doc95%

llamathe tone rule91%

Three confident answers, none falsifiable.

ContextReAct loop

System Prompt

Messages

Tools

messageAPIassemble

CallLLMsend request

Route

→ approved ✗

ToolCalls↻ loop again

↩ rewinding step 14 / 14

02Rewind to the causeEvery piece of context lands in one place, so you can trace back to it.

02 · the solution

Context engineering, abstracted.

That wrong document reached the System Prompt because something injected it. So here is how every piece of context gets in — making the cause a place you can rewind to. Skills, steering, RAG, facts, memory, guardrails: every name for context does one thing — it injects into one of three LLM slots, under one of four triggers, and the framework caches it for you.

Injection=slot×trigger×cache

The model — what we abstract

Many flavors. Three slots.

The data and instructions you collect wear many names. Each lands in system, messages, or tools — and several land in more than one. Scroll to map each flavor to the slot(s) it really injects into.

SteeringSkillGuardrailMemoryRAGFactTool API

one LLM call

system

messages

tools

When each one fires

Four triggers decide when.

A slot says where content lands; a trigger says when it fires. Scroll to watch each kind light up where in the loop it acts — from always-on rules to context the model unlocks itself by calling read_skill.

ContextReAct loop

System Prompt

Messages

Tools

messageAPIassemble

CallLLMsend request

Route

→ answer

ToolCalls↻ loop again

In your code

Declare the flavor. Not the prompt string.

You attach typed pieces — a fact, a rule, a skill. The framework decides which slot and which iteration each fires on, places the cache markers, and records every injection it makes.

support-agent.ts

const agent = Agent.create(({ provider, model })
  .system('You are a support agent.')
  .fact(defineFact(({           // data — always on → system
    id: 'user-profile',
    data: 'Name: Maya · Plan: Pro · since 2022',
  }))
  .steering(defineSteering(({   // steering — always on → system
    id: 'refund-policy',
    prompt: 'Never promise a refund before checking policy.',
  }))
  .skill(defineSkill(({         // unlocks via read_skill → system + tools
    id: 'billing',
    description: 'Use for refunds, charges, billing.',
    body: 'Confirm identity first, then…',
    tools: [refundTool, lookupCharge, issueCredit],
  }))
  .build();

How the assembly runs

The prompt recomposes every iteration.

The model reasons, decides which skill it needs, and the framework re-engineers all three slots— system, messages, and tools — around that decision. Tools the model can't use yet never enter the window, so the context shrinks to what each step needs — for tool-heavy agents, that’s where the savings come from. Scroll to walk the three iterations.

Task: “Refund my last charge.”

ContextReAct loop

steering onlySystem Prompt

Messages

1Tools

messageAPIassemble

CallLLMsend request

Route

→ answer

ToolCalls↻ loop again

Classic ReAct

The loop re-runs the injection step, but only the Messages slot recomposes — System Prompt and Tools are cached after turn 1. So all 12 tools ride along every turn, whether the step needs them or not.

loop → re-engineer · only Messages recomposes · system + 12 tools cached

Dynamic ReAct — agentfootprint

Same loop — but all three slots recompose every turn. Injections that fired on the last tool result rewrite the next prompt; tools appear only once unlocked.

loop → re-engineer · all 3 slots recompose · 1 → 5 tools, on demand

03Catch it before it answersThe same trace runs forward — see why it picked that, and fix it.

Same machinery, a different agent

Why this tool?

Backtracking proved why a past run broke. The same recorded panel works forward too — a travel agent picks one of 4 tools by their descriptions. Scroll to reveal the scores, sharpen the tie, then swap scorers.

thinking•••

LLM

✈search_flights

⌂search_hotels

▣book_hold

≡load_skill

The agent picked search_hotels by its description. So — why this tool?

agentfootprint owns the detection — the scores, the ties, the recorded graph. You own the policy — the scorer, the rewriter. We map; you decide.

04The run records itselfEvery step is captured as it happens, so you can rewind to any moment.

04 · how it works

The brain thinks, asks a tool, loops to the answer.

Every step is emitted as it happens — no instrumentation, no backtracking yet. agentfootprint just records the real flow as the loop runs.

The loop records itself.

As the agent runs, every event drains into a typed log — prompt · ask · return · answer — with its own cost. Scroll to time-travel the footprint you’ll later walk backward.

executionReAct loop · the hot path

ContextReAct loop1

System Prompt

Messages

Tools

messageAPIassemble

CallLLMsend request

Routeroute

Finalanswer

ToolCallsexecute

drain logtyped footprint · one row per node

1promptassemble the context for this turn180ms · 90tok

↳ logs collect as we run and connect as they execute — one row per node, with its own cost. This is the footprint.

Step 1 / 12 — The turn begins — Context is assembled. It emits to the recorder as it happens.

Your agent is the event loop.

Same recorded run, viewed from the runtime side. A stage runs on the call stack and feeds its trace events into a queue. Opt into deferred delivery and the engine drains them at the next microtask — at each stage boundary, one beat behind, off the hot path.

your codethe agent · stepping per stage

ContextReAct loop1

System Prompt

Messages

Tools

messageAPIassemble

CallLLMsend request

Routeroute

Finalanswer

ToolCallsexecute

the runtimethe event loop · one beat behind

↳ grey is JavaScript’s own machinery · green is footprintjs · in deferred mode the watching drains on the next microtask, off the hot path — near-zero overhead on the critical path.

Stage Context (1/12) — runs on the call stack and feeds its trace events into the queue, on the hot path.

05Proven, not guessedRecord the run, rewind to the cause, prove the fix by replaying it.

the whole system

How it all fits together.

Skills, RAG, memory, rules — composed into the system / messages / tools slots, run, and recorded as a traceable footprint you can reverse.

agentfootprint system overview: context sources compose through the agent into the system, messages, and tools slots, then the LLM produces a structured answer

open source · MIT

Stop guessing why your agent answered wrong.

Record every run. Reverse it to the exact cause. Prove the fix by replaying it.

Trace your first run → Star on GitHub

$ npm i agentfootprint