How agentfootprint thinks

A framework’s first job is to decide what it won’t be. agentfootprint is a substrate for context engineering — not an LLM SDK, not an agent platform, not a workflow orchestrator. This page is the perspective behind those rejections.

Why we exist — agents have a new class of bug

For fifty years, software bugs have been logic errors. A wrong condition, a missed edge case, an off-by-one. You step through the code with a debugger until you find the bad branch.

LLM-powered apps add a second class of bug: contextual errors. The code is correct. The model is correct. The answer is wrong because the LLM’s decision rests on context that was ambiguous, missing, or invalidated at the moment of inference.

Tracking which content the model actually saw, and why, is the entire debugging job. Without it, the failure mode is invisible:

What got injected wrong	What the model did
Wrong instruction landed in the `system` slot	Followed the wrong rule
Predicate fired one iteration too early	Reasoned with stale assumptions
Skill body missing when LLM called `read_skill`	Invented its own
Cache prefix invalidated mid-iteration	Saw a silently rewritten stale version
Tool returned but `on-tool-return` injection didn’t fire	Couldn’t interpret the result

That’s the gap agentfootprint exists to close. A framework that owns the control flow can debug logic errors. A framework that owns the injection can debug contextual errors — because every injection is a typed event with a where, when, why, and how-it-cached.

What we are

We are an abstraction over context engineering — the discipline of deciding what content lands in which slot of an LLM call, when, and why. Every LLM call has three slots: system, messages, tools. Every agent feature — Skills, Steering, Guardrails, RAG, Tool APIs, Memory — is content flowing into one of those slots, decided by some rule, at some moment in the iteration loop. agentfootprint models all of them as one primitive:

Injection = slot × trigger × cache

Every LLM call has 3 fixed slots (system, messages, tools); every flavor lands in one slot under one of 4 fixed triggers (always, rule, on-tool-return, llm-activated).

Three slots are fixed by the LLM API surface. Four triggers are fixed by when the framework can fire (always at build time, rule on a predicate, on-tool-return after a tool result, llm-activated when the LLM calls read_skill). Cache strategy is per-injection. Every flavor — present and future — fits this grid.

You describe injections declaratively. The framework evaluates every trigger every iteration, composes the slots, observes every decision as a typed event, and persists checkpoints you can replay six months later.

What that buys you

Because we own the injection, every LLM call backtracks to four typed answers:

What was injected (which flavor, which content)
Who triggered it (which rule fired)
When it fired (which iteration, after which event)
How it landed (which slot, with what cache strategy)

Same trace, three workflows:

Mode	What you do	What the trace gives you
Live	Debug as you build	Exactly which injection produced which token; which predicate fired this iteration; which prefix actually got cached
Offline	Monitor what shipped	Replay any past run from its trace. Alert on drift. Attribute cost per injection.
Detailed	Improve via export	Every successful trajectory is labeled training data for SFT, DPO, or process-RL — no separate data-collection phase

And a fourth, novel: the agent can read its own trace. Six months after the agent rejected loan #42, “why did you reject it?” answers from the recorded evidence (creditScore=580, threshold=600), not a rerun. Causal memory turns the trace into the agent’s working memory.

One agent run produces a JSON-portable causal trace. Three downstream consumers fan out: audit replay, cheap-model triage, training data export.

What footprintjs gives us for free

The agent space has many credible primary abstractions — pipelines (LangChain), graphs (LangGraph), crews (CrewAI · AutoGen), typed bundles (Mastra · Genkit · Pydantic AI), compiled prompts (DSPy), durable workflows (Inngest AgentKit). We didn’t have to choose between them.

agentfootprint is built on footprintjs — the flowchart pattern for backend code. footprintjs gives us every one of those abstractions out of the box:

Capability	What footprintjs hands us
Composition	`Sequence` · `Parallel` · `Conditional` · `Loop`
State machines	The ReAct loop is a flowchart
Multi-agent crews	Compose Agents through control flow — no special class needed
Durable workflows	`pauseHere()` plus JSON-portable `resume()`
Typed observation	57+ events for free, because the framework owns the loop

So we used the budget those abstractions would have cost us to invest deeply in something they all leave to the developer: the injection loop.

What we are not

We are not an LLM SDK wrapper. Provider adapters are 100-line shims around vendor SDKs. They translate LLMRequest ↔ vendor format. They do not invent abstractions.
We are not an agent platform. No deployment dashboard, no managed runtime, no saved-agent registry. You bring your own infrastructure.
We are not a workflow orchestrator. Sequence / Parallel / Conditional / Loop are 4 compositions, not 40 step types. You will not find scheduled triggers, queue management, or retry-with-DLQ semantics here.
We are not multi-modal. LLMMessage.content is a string. Images and video would multiply the surface; we said no.
We are not a graph DSL. Compositions are typed functions, not a separate graph language. You will not write JSON or YAML to describe an agent.

These rejections are deliberate. Each thing we are not is a thing we DO not have to maintain, document, observe, or version.

What we believe

The framework should own the loop

The biggest lesson from React, autograd, Prisma, Kubernetes — every load-bearing dev tool of the last decade — is that the framework owns the traversal. When the framework owns it, the framework can record everything that happens inside it. Without a single round-trip on your part. Without an instrumentation layer.

agentfootprint’s flowchart-pattern substrate (footprintjs) is what makes our typed-event stream + replayable traces automatic. You do not write agent.observe(...). You write .steering(rule) and the framework already knows when that rule fired, on which iteration, against which context.

Owning the loop means recomposing context every iteration — Dynamic ReAct

The corollary that makes “context engineering” worth the name. Static prompt assembly is what every framework does. Per-iteration recomposition — re-running every Injection trigger, recomputing the system prompt, recomputing the tool list, all based on the latest tool result + accumulated state — is what makes context engineering compositional instead of static.

This is structurally distinct from LangChain (assembles prompts once per turn), LangGraph (composes state per node, not per loop iteration), or CrewAI (tool-aware but not iteration-aware). It’s the closest a framework comes to “executive-function-like” behavior — context that adapts to what the agent just observed, not just what it was originally told.

The use cases that emerge are real, not theoretical: tool-by-tool LLM steering, adaptive tool exposure (per-skill gating), cost guardrails, iterative format refinement, failure adaptation, few-shot evolution, long-context skill body refresh. See the Dynamic ReAct guide for the full taxonomy.

If “context engineering” is the discipline, Dynamic ReAct is what makes the discipline expressive. Without it, the bar drops to static prompt assembly and we’d have nothing distinctive to say.

Every named pattern is a recipe, not a class

Reflexion, Tree-of-Thoughts, Self-Consistency, Debate, Map-Reduce, Swarm — every named pattern in the agent literature reduces to a composition of our 2 primitives + 4 compositions. We were one paper away from shipping a thirteenth Agent class when we stopped and asked: what if every new pattern is just a composition of what we already have? That question is why our core stays small as the field grows. New paper drops? New recipe in examples/patterns/. No new engine code.

Mocks are first-class, not an afterthought

Generative AI development is expensive when every iteration hits a paid API. We treat $0 development as a first-class workflow — the entire app (agent, context, memory, RAG, MCP) builds against in-memory mocks. Real infrastructure swaps in one boundary at a time. The flowchart, narrative, recorders, and tests do not change between dev and prod.

This is structurally different from “you can use mocks if you want.” We assume you will, and design for it.

Multi-tenant isolation is enforced at the storage boundary

Every memory call takes a MemoryIdentity tuple — { tenant?, principal?, conversationId }. Adapters MUST namespace internal keys by the full tuple. A bug passing the wrong tenant surfaces as “no data” — never as a cross-tenant leak. This is a non-negotiable property for any framework that wants to be deployed in regulated multi-tenant environments. We chose it as a default, not an option.

We will say “coming in vN.x” instead of pretending

When a feature isn’t shipped yet, we say so explicitly — even when prose would read smoother by glossing over it. CHANGELOG is the source of truth for what’s actually released. Code blocks for unshipped features are clearly marked as illustrative pseudo-code. This rule cost us prose pretty in the short term and bought us trust in the long term.

What we ask of you

If you adopt this framework, you adopt the discipline that comes with it:

Pass per-tenant identity at every agent.run() in production. The default is for prototypes only.
Use the typed event stream, not console.log inside tools. The framework already knows; subscribe.
Build with mocks first. Run the agent end-to-end against mock() before you wire the real provider. Catch bugs in CI for $0; pay for tokens once you ship.
Express new patterns as compositions of what exists. If you find yourself wanting a new primitive, ask first whether your idea is a Sequence-of-existing-stuff in disguise. It usually is.

The unbuilt future

We have rejected building several things that other frameworks ship, in some cases vigorously requested. Examples:

A graph DSL — JavaScript is the DSL.
A managed runtime — bring your own Lambda / container / cron.
Per-step retry queues — that’s your queue’s job.
A saved-agent registry — your filesystem + git already do this.
Model-output validation built into the agent — use a tool for that and call it.

Each of these would expand the surface. None of them would make the abstraction sharper. We will continue saying no to additions that do not earn a place in the substrate.