Observability
Logs that say “agent ran” are useless. Logs that say “agent called
redact_pii({text: '...'})and then injected the result into the messages slot before iteration 3, costing $0.0042” let you debug a hallucination at 2 AM. The difference is whether observability is structured + typed at the source or string-formatted at the sink.
Three workflows from one trace
Section titled “Three workflows from one trace”agentfootprint’s observability isn’t bolted on. The framework owns the loop, so every decision and execution is recorded during DFS traversal — not collected after the fact via spans. That single artifact serves three workflows:
| Mode | What you do | What the trace gives you |
|---|---|---|
| Live | Debug as you build | Exactly which injection produced which token; which predicate fired this iteration; which prefix actually got cached |
| Offline | Monitor what shipped | Replay any past run from its trace. Alert on drift. Attribute cost per injection. |
| Detailed | Improve via export | Every successful trajectory is labeled training data for SFT, DPO, or process-RL — no separate data-collection phase |
And a fourth, novel: the agent can read its own trace. Six months after the agent rejected loan #42, “why did you reject it?” answers from the recorded evidence (creditScore=580, threshold=600), not a rerun. Causal memory turns the trace into the agent’s working memory.
The model
Section titled “The model”Every runner (Agent, LLMCall, Sequence, etc.) exposes the same observability surface, in three layers:
| Layer | API | When to use |
|---|---|---|
| Events | runner.on('agentfootprint.<domain>.<event>', fn) | You want a typed stream — payloads are auto-completed by your IDE |
| Recorders | runner.attach(recorder) | You want pre-built aggregation (cost totals, token counts, narrative entries) |
| Tier-3 helpers | runner.enable.thinking({...}), runner.enable.logging({...}) | You want a one-liner for the common case (status line, structured logs) |
There are 60+ typed events across 18 domains: agent (incl. output_schema_validation_failed for Instructor-style retry), composition, context, stream, tools, skill, memory, cache, cost, permission (incl. halt for sequence governance), eval, embedding, pause, error, fallback, resilience, reliability, risk. Every event has a typed payload — gen_ai.chat, tool.start, cost.tick, composition.iteration_start, etc. The framework owns the loop and emits during DFS traversal — no instrumentation, no spans-after-the-fact.
Tier-3 — .enable.thinking() and .enable.logging()
Section titled “Tier-3 — .enable.thinking() and .enable.logging()”The 80% case: a live status line (“what is the agent doing right now?”) and a domain-filtered log firehose. Both are one-liners.
// Live status line — user-facing "what's the agent doing right now".const stopThinking = agent.enable.thinking({ onStatus: (status) => console.log(` ⎈ ${status}`),});.enable.thinking({ onStatus }) gives you a Claude-Code-style live status line — one string per agent state transition. Use for terminal UIs, CLI progress, or chat-UI typing indicators.
// Firehose logging filtered to stream + agent domains. The logger// object wraps console, pino, winston, etc. — any object with a// `log(message, data?)` method.const stopLogging = agent.enable.logging({ domains: [LoggingDomains.STREAM, LoggingDomains.AGENT], logger: { log: (message) => console.log(` [log] ${message}`), },});.enable.logging({ domains, logger }) gives you structured logs filtered to specific event domains. The logger is any object with a log(message, data?) method — drop in console, pino, winston, or your own.
Both return a stop() function — call it on shutdown to detach the listener cleanly.
Tier-3 — .enable.observability() — grouped vendor strategies
Section titled “Tier-3 — .enable.observability() — grouped vendor strategies”For shipping events to a vendor backend (CloudWatch, X-Ray, OTel, AgentCore), .enable.observability(...) accepts a typed strategy + optional detach driver. The strategy is the WHERE; the detach is the HOW (sync inline vs fire-and-forget):
import { agentcoreObservability } from 'agentfootprint/observability-providers';import { microtaskBatchDriver } from 'footprintjs/detach';
const stop = agent.enable.observability({ strategy: agentcoreObservability({ region: 'us-east-1', logGroupName: '/agentfootprint/my-agent', }), // Recommended for production — slow exporters never block the agent loop. // The strategy's `exportEvent(event)` runs on the driver's schedule // instead of inline on the dispatcher's hot path. detach: { driver: microtaskBatchDriver, mode: 'forget' },});Detach modes
Section titled “Detach modes”detach.mode | Returns | Use when |
|---|---|---|
omitted (no detach) | sync | Strategy is fast (< 100µs). Default, back-compat. |
'forget' (default when detach set) | void | Pure fire-and-forget telemetry. |
'join-later' | onHandle(h) callback fires per event | You want to await exports later (tests, backpressure). |
Drivers
Section titled “Drivers”Pick by environment — see footprintjs/detach for the full list. All exported from 'footprintjs/detach':
| Driver | Best for |
|---|---|
microtaskBatchDriver | Default. Cross-runtime. Lowest latency. |
setImmediateDriver | Node only. Yields to I/O before flushing. |
setTimeoutDriver | Cross-runtime, configurable delay. |
sendBeaconDriver | Browser. Survives page-unload (analytics). |
workerThreadDriver | CPU-isolated. For heavy serialization. |
Graceful shutdown
Section titled “Graceful shutdown”flushAllDetached() (from 'footprintjs/detach') drains every in-flight handle process-wide:
import { flushAllDetached } from 'footprintjs/detach';
process.on('SIGTERM', async () => { const stats = await flushAllDetached({ timeoutMs: 10_000 }); console.log(`Drained ${stats.done}, failed ${stats.failed}, pending ${stats.pending}`); process.exit(stats.pending === 0 ? 0 : 1);});Vendor adapters — agentfootprint/observability-providers
Section titled “Vendor adapters — agentfootprint/observability-providers”Vendor strategies ship under one grouped subpath (parallel-providers pattern, mirrors llm-providers / tool-providers / memory-providers):
| Adapter | Status | Peer dep (optional) |
|---|---|---|
agentcoreObservability | ✅ shipped | @aws-sdk/client-cloudwatch-logs |
cloudwatchObservability | ✅ shipped | @aws-sdk/client-cloudwatch-logs |
xrayObservability | ✅ shipped | @aws-sdk/client-xray |
otelObservability | ✅ shipped | @opentelemetry/api + @opentelemetry/sdk-node |
datadogObservability | roadmap | dd-trace (use otelObservability with Datadog OTel collector for now) |
All peer deps are declared optional — consumers who never call a particular factory don’t need to install its SDK. Each adapter lazy-imports its SDK at first use.
enable.cost(...) accepts the same detach option for the same reasons. enable.thinking and enable.lens deliberately stay sync — UI render must feel responsive.
Cost tracking
Section titled “Cost tracking”Pass a pricingTable to the Agent constructor and the framework emits agentfootprint.cost.tick after every LLM call with per-call and cumulative USD. Add a costBudget and you also get a one-shot cost.limit_hit the first time cumulative crosses the budget. The library never auto-aborts — you decide what to do (cancel via signal, escalate, log, etc.):
// 'feature' kind drives the smart tool-call flow. Cost ticks fire// automatically off the per-iteration usage MockProvider estimates// (chars/4) — sufficient to demo the budget crossing.const agent = Agent.create({ provider: provider ?? exampleProvider('feature'), model: 'demo-sonnet', pricingTable: pricing, costBudget: 0.0001, // trip the warning}) .system('') .tool({ schema: { name: 'noop', description: '', inputSchema: { type: 'object' } }, execute: () => 'ok', }) .build();
agent.on('agentfootprint.cost.tick', (e) => { const p = e.payload; console.log( `[tick] +$${p.estimatedUsd.toFixed(6)} — cumulative $${p.cumulative.estimatedUsd.toFixed(6)}`, );});Event taxonomy
Section titled “Event taxonomy”agentfootprint.<domain>.<event>| Domain | Sample events | What it covers |
|---|---|---|
agent | turn_start · turn_end · iteration_start · iteration_end · route_decided · handoff · output_schema_validation_failed | ReAct loop boundaries; output_schema_validation_failed fires inside the reliability gate when the LLM’s final answer fails the agent’s outputSchema (v2.13+) |
composition | enter · exit · iteration_start · iteration_exit · fork_start · branch_complete · merge_end · route_decided | Sequence / Parallel / Conditional / Loop lifecycles |
context | injected · evaluated · evicted · slot_composed · budget_pressure · memory | Context engineering — every injection’s full lifecycle |
stream | llm_start · llm_end · token · tool_start · tool_end | Token-by-token + tool-call lifecycle |
tools | offered · activated · deactivated · discovery_started · discovery_completed · discovery_failed | ToolProvider activation per iteration (skills, gating); async-provider lifecycle (start/complete/fail with durationMs) |
skill | activated · deactivated | LLM-activated skill resolution |
memory | attached · detached · written · strategy_applied | Memory pipeline subflows |
cache | applied · metrics | Provider cache marker placement + hit-rate |
cost | tick · limit_hit | Per-call USD + budget threshold crossings |
permission | check · gate_opened · gate_closed · halt | PermissionChecker decisions; halt fires when a checker terminates the run via sequence governance (v2.12+) |
eval | score · threshold_crossed | Custom quality / guardrail signals |
embedding | generated | Embedder calls (RAG / semantic memory) |
pause | request · resume | Human-in-the-loop checkpoints |
error | fatal · recovered · retried | Stage-level error events |
fallback | triggered | Schema-validation fallback chain |
resilience | circuit_state_changed · retry · fallback · output_fallback_triggered · output_canned_used | v2.10.x reliability primitives |
reliability | fail_fast | v2.11.5 reliability gate fail-fast events |
risk | flagged | Eval guardrails flagging suspect output |
Every event is exhaustively typed via AgentfootprintEvent (a discriminated union); your IDE autocompletes the payload shape based on the event name. No string-parsing logs. No “I think this means the tool was called.”
Custom recorders
Section titled “Custom recorders”For aggregations the built-ins don’t cover, write your own. A recorder is just an object with on* methods matching the event names:
const myRecorder = { id: 'my-recorder', onContextInjected: (e) => { // e.payload is fully typed — { source, slot, role?, content?, ... } }, onAgentTurnEnd: (e) => { // e.payload — { iterationCount, totalInputTokens, totalOutputTokens, ... } },};
runner.attach(myRecorder);The recorder discovers handler methods via runtime method-shape detection — implement only the events you care about; ignore the rest.
OpenTelemetry
Section titled “OpenTelemetry”otelObservability ships in agentfootprint/observability-providers. It maps agentfootprint events onto OTel spans + attributes via your own @opentelemetry/api tracer. Pair with @opentelemetry/sdk-node and your favorite exporter (Jaeger, Tempo, Datadog OTel collector, etc.):
import { otelObservability } from 'agentfootprint/observability-providers';import { trace } from '@opentelemetry/api';
const stop = agent.enable.observability({ strategy: otelObservability({ tracer: trace.getTracer('my-agent', '1.0.0'), }), detach: { driver: microtaskBatchDriver, mode: 'forget' },});Anti-patterns
Section titled “Anti-patterns”- ❌ Don’t log every event — 57+ typed events × N iterations × M agents = your log infra hates you. Filter to the domains you actually need.
- ❌ Don’t subscribe in a hot loop —
runner.on(...)returns an unsubscribe; call it on shutdown. Forgotten subscriptions leak listeners across runs. - ❌ Don’t use
getNarrative()as a log sink — it’s structured trace data, not log lines. Pass the entries to a real log/trace backend if you need to ship them.
Next steps
Section titled “Next steps”- Causal memory deep dive — the trace as the agent’s working memory
- Reliability gate —
agentfootprint.reliability.fail_fastevent semantics - Streaming guide — token-by-token rendering via
provider.stream()