Observability

Logs that say “agent ran” are useless. Logs that say “agent called redact_pii({text: '...'}) and then injected the result into the messages slot before iteration 3, costing $0.0042” let you debug a hallucination at 2 AM. The difference is whether observability is structured + typed at the source or string-formatted at the sink.

Three workflows from one trace

agentfootprint’s observability isn’t bolted on. The framework owns the loop, so every decision and execution is recorded during DFS traversal — not collected after the fact via spans. That single artifact serves three workflows:

Mode	What you do	What the trace gives you
Live	Debug as you build	Exactly which injection produced which token; which predicate fired this iteration; which prefix actually got cached
Offline	Monitor what shipped	Replay any past run from its trace. Alert on drift. Attribute cost per injection.
Detailed	Improve via export	Every successful trajectory is labeled training data for SFT, DPO, or process-RL — no separate data-collection phase

And a fourth, novel: the agent can read its own trace. Six months after the agent rejected loan #42, “why did you reject it?” answers from the recorded evidence (creditScore=580, threshold=600), not a rerun. Causal memory turns the trace into the agent’s working memory.

One agent run produces a JSON-portable causal trace. Three downstream consumers fan out: audit replay, cheap-model triage, training data export. The fourth, novel: the agent can read its own trace.

The model

Every runner (Agent, LLMCall, Sequence, etc.) exposes the same observability surface, in three layers:

Layer	API	When to use
Events	`runner.on('agentfootprint.<domain>.<event>', fn)`	You want a typed stream — payloads are auto-completed by your IDE
Recorders	`runner.attach(recorder)`	You want pre-built aggregation (cost totals, token counts, narrative entries)
Tier-3 helpers	`runner.enable.thinking({...})`, `runner.enable.logging({...})`	You want a one-liner for the common case (status line, structured logs)

There are 60+ typed events across 18 domains: agent (incl. output_schema_validation_failed for Instructor-style retry), composition, context, stream, tools, skill, memory, cache, cost, permission (incl. halt for sequence governance), eval, embedding, pause, error, fallback, resilience, reliability, risk. Every event has a typed payload — gen_ai.chat, tool.start, cost.tick, composition.iteration_start, etc. The framework owns the loop and emits during DFS traversal — no instrumentation, no spans-after-the-fact.

Tier-3 — `.enable.thinking()` and `.enable.logging()`

The 80% case: a live status line (“what is the agent doing right now?”) and a domain-filtered log firehose. Both are one-liners.

// Live status line — user-facing "what's the agent doing right now".
const stopThinking = agent.enable.thinking({
  onStatus: (status) => console.log(`  ⎈ ${status}`),
});

.enable.thinking({ onStatus }) gives you a Claude-Code-style live status line — one string per agent state transition. Use for terminal UIs, CLI progress, or chat-UI typing indicators.

// Firehose logging filtered to stream + agent domains. The logger
// object wraps console, pino, winston, etc. — any object with a
// `log(message, data?)` method.
const stopLogging = agent.enable.logging({
  domains: [LoggingDomains.STREAM, LoggingDomains.AGENT],
  logger: {
    log: (message) => console.log(`  [log] ${message}`),
  },
});

.enable.logging({ domains, logger }) gives you structured logs filtered to specific event domains. The logger is any object with a log(message, data?) method — drop in console, pino, winston, or your own.

Both return a stop() function — call it on shutdown to detach the listener cleanly.

Tier-3 — `.enable.observability()` — grouped vendor strategies

For shipping events to a vendor backend (CloudWatch, X-Ray, OTel, AgentCore), .enable.observability(...) accepts a typed strategy + optional detach driver. The strategy is the WHERE; the detach is the HOW (sync inline vs fire-and-forget):

import { agentcoreObservability } from 'agentfootprint/observability-providers';
import { microtaskBatchDriver } from 'footprintjs/detach';

const stop = agent.enable.observability({
  strategy: agentcoreObservability({
    region: 'us-east-1',
    logGroupName: '/agentfootprint/my-agent',
  }),
  // Recommended for production — slow exporters never block the agent loop.
  // The strategy's `exportEvent(event)` runs on the driver's schedule
  // instead of inline on the dispatcher's hot path.
  detach: { driver: microtaskBatchDriver, mode: 'forget' },
});

Detach modes

`detach.mode`	Returns	Use when
omitted (no `detach`)	sync	Strategy is fast (< 100µs). Default, back-compat.
`'forget'` (default when `detach` set)	`void`	Pure fire-and-forget telemetry.
`'join-later'`	`onHandle(h)` callback fires per event	You want to `await` exports later (tests, backpressure).

Drivers

Pick by environment — see footprintjs/detach for the full list. All exported from 'footprintjs/detach':

Driver	Best for
`microtaskBatchDriver`	Default. Cross-runtime. Lowest latency.
`setImmediateDriver`	Node only. Yields to I/O before flushing.
`setTimeoutDriver`	Cross-runtime, configurable delay.
`sendBeaconDriver`	Browser. Survives page-unload (analytics).
`workerThreadDriver`	CPU-isolated. For heavy serialization.

Graceful shutdown

flushAllDetached() (from 'footprintjs/detach') drains every in-flight handle process-wide:

import { flushAllDetached } from 'footprintjs/detach';

process.on('SIGTERM', async () => {
  const stats = await flushAllDetached({ timeoutMs: 10_000 });
  console.log(`Drained ${stats.done}, failed ${stats.failed}, pending ${stats.pending}`);
  process.exit(stats.pending === 0 ? 0 : 1);
});

Vendor adapters — `agentfootprint/observability-providers`

Vendor strategies ship under one grouped subpath (parallel-providers pattern, mirrors llm-providers / tool-providers / memory-providers):

Adapter	Status	Peer dep (optional)
`agentcoreObservability`	✅ shipped	`@aws-sdk/client-cloudwatch-logs`
`cloudwatchObservability`	✅ shipped	`@aws-sdk/client-cloudwatch-logs`
`xrayObservability`	✅ shipped	`@aws-sdk/client-xray`
`otelObservability`	✅ shipped	`@opentelemetry/api` + `@opentelemetry/sdk-node`
`datadogObservability`	roadmap	`dd-trace` (use `otelObservability` with Datadog OTel collector for now)

All peer deps are declared optional — consumers who never call a particular factory don’t need to install its SDK. Each adapter lazy-imports its SDK at first use.

enable.cost(...) accepts the same detach option for the same reasons. enable.thinking and enable.lens deliberately stay sync — UI render must feel responsive.

Cost tracking

Pass a pricingTable to the Agent constructor and the framework emits agentfootprint.cost.tick after every LLM call with per-call and cumulative USD. Add a costBudget and you also get a one-shot cost.limit_hit the first time cumulative crosses the budget. The library never auto-aborts — you decide what to do (cancel via signal, escalate, log, etc.):

// 'feature' kind drives the smart tool-call flow. Cost ticks fire
// automatically off the per-iteration usage MockProvider estimates
// (chars/4) — sufficient to demo the budget crossing.
const agent = Agent.create({
  provider: provider ?? exampleProvider('feature'),
  model: 'demo-sonnet',
  pricingTable: pricing,
  costBudget: 0.0001, // trip the warning
})
  .system('')
  .tool({
    schema: { name: 'noop', description: '', inputSchema: { type: 'object' } },
    execute: () => 'ok',
  })
  .build();

agent.on('agentfootprint.cost.tick', (e) => {
  const p = e.payload;
  console.log(
    `[tick] +$${p.estimatedUsd.toFixed(6)} — cumulative $${p.cumulative.estimatedUsd.toFixed(6)}`,
  );
});

Event taxonomy

agentfootprint.<domain>.<event>

Domain	Sample events	What it covers
`agent`	`turn_start` · `turn_end` · `iteration_start` · `iteration_end` · `route_decided` · `handoff` · `output_schema_validation_failed`	ReAct loop boundaries; `output_schema_validation_failed` fires inside the reliability gate when the LLM’s final answer fails the agent’s `outputSchema` (v2.13+)
`composition`	`enter` · `exit` · `iteration_start` · `iteration_exit` · `fork_start` · `branch_complete` · `merge_end` · `route_decided`	Sequence / Parallel / Conditional / Loop lifecycles
`context`	`injected` · `evaluated` · `evicted` · `slot_composed` · `budget_pressure` · `memory`	Context engineering — every injection’s full lifecycle
`stream`	`llm_start` · `llm_end` · `token` · `tool_start` · `tool_end`	Token-by-token + tool-call lifecycle
`tools`	`offered` · `activated` · `deactivated` · `discovery_started` · `discovery_completed` · `discovery_failed`	ToolProvider activation per iteration (skills, gating); async-provider lifecycle (start/complete/fail with `durationMs`)
`skill`	`activated` · `deactivated`	LLM-activated skill resolution
`memory`	`attached` · `detached` · `written` · `strategy_applied`	Memory pipeline subflows
`cache`	`applied` · `metrics`	Provider cache marker placement + hit-rate
`cost`	`tick` · `limit_hit`	Per-call USD + budget threshold crossings
`permission`	`check` · `gate_opened` · `gate_closed` · `halt`	PermissionChecker decisions; `halt` fires when a checker terminates the run via sequence governance (v2.12+)
`eval`	`score` · `threshold_crossed`	Custom quality / guardrail signals
`embedding`	`generated`	Embedder calls (RAG / semantic memory)
`pause`	`request` · `resume`	Human-in-the-loop checkpoints
`error`	`fatal` · `recovered` · `retried`	Stage-level error events
`fallback`	`triggered`	Schema-validation fallback chain
`resilience`	`circuit_state_changed` · `retry` · `fallback` · `output_fallback_triggered` · `output_canned_used`	v2.10.x reliability primitives
`reliability`	`fail_fast`	v2.11.5 reliability gate fail-fast events
`risk`	`flagged`	Eval guardrails flagging suspect output

Every event is exhaustively typed via AgentfootprintEvent (a discriminated union); your IDE autocompletes the payload shape based on the event name. No string-parsing logs. No “I think this means the tool was called.”

Custom recorders

For aggregations the built-ins don’t cover, write your own. A recorder is just an object with on* methods matching the event names:

const myRecorder = {
  id: 'my-recorder',
  onContextInjected: (e) => {
    // e.payload is fully typed — { source, slot, role?, content?, ... }
  },
  onAgentTurnEnd: (e) => {
    // e.payload — { iterationCount, totalInputTokens, totalOutputTokens, ... }
  },
};

runner.attach(myRecorder);

The recorder discovers handler methods via runtime method-shape detection — implement only the events you care about; ignore the rest.

OpenTelemetry

otelObservability ships in agentfootprint/observability-providers. It maps agentfootprint events onto OTel spans + attributes via your own @opentelemetry/api tracer. Pair with @opentelemetry/sdk-node and your favorite exporter (Jaeger, Tempo, Datadog OTel collector, etc.):

import { otelObservability } from 'agentfootprint/observability-providers';
import { trace } from '@opentelemetry/api';

const stop = agent.enable.observability({
  strategy: otelObservability({
    tracer: trace.getTracer('my-agent', '1.0.0'),
  }),
  detach: { driver: microtaskBatchDriver, mode: 'forget' },
});

Anti-patterns

❌ Don’t log every event — 57+ typed events × N iterations × M agents = your log infra hates you. Filter to the domains you actually need.
❌ Don’t subscribe in a hot loop — runner.on(...) returns an unsubscribe; call it on shutdown. Forgotten subscriptions leak listeners across runs.
❌ Don’t use getNarrative() as a log sink — it’s structured trace data, not log lines. Pass the entries to a real log/trace backend if you need to ship them.

Next steps

Causal memory deep dive — the trace as the agent’s working memory
Reliability gate — agentfootprint.reliability.fail_fast event semantics
Streaming guide — token-by-token rendering via provider.stream()