Skip to content

Observability

Logs that say “agent ran” are useless. Logs that say “agent called redact_pii({text: '...'}) and then injected the result into the messages slot before iteration 3, costing $0.0042” let you debug a hallucination at 2 AM. The difference is whether observability is structured + typed at the source or string-formatted at the sink.

agentfootprint’s observability isn’t bolted on. The framework owns the loop, so every decision and execution is recorded during DFS traversal — not collected after the fact via spans. That single artifact serves three workflows:

ModeWhat you doWhat the trace gives you
LiveDebug as you buildExactly which injection produced which token; which predicate fired this iteration; which prefix actually got cached
OfflineMonitor what shippedReplay any past run from its trace. Alert on drift. Attribute cost per injection.
DetailedImprove via exportEvery successful trajectory is labeled training data for SFT, DPO, or process-RL — no separate data-collection phase

And a fourth, novel: the agent can read its own trace. Six months after the agent rejected loan #42, “why did you reject it?” answers from the recorded evidence (creditScore=580, threshold=600), not a rerun. Causal memory turns the trace into the agent’s working memory.

One agent run produces a JSON-portable causal trace. Three downstream consumers fan out: audit replay, cheap-model triage, training data export. The fourth, novel: the agent can read its own trace.

Every runner (Agent, LLMCall, Sequence, etc.) exposes the same observability surface, in three layers:

LayerAPIWhen to use
Eventsrunner.on('agentfootprint.<domain>.<event>', fn)You want a typed stream — payloads are auto-completed by your IDE
Recordersrunner.attach(recorder)You want pre-built aggregation (cost totals, token counts, narrative entries)
Tier-3 helpersrunner.enable.thinking({...}), runner.enable.logging({...})You want a one-liner for the common case (status line, structured logs)

There are 60+ typed events across 18 domains: agent (incl. output_schema_validation_failed for Instructor-style retry), composition, context, stream, tools, skill, memory, cache, cost, permission (incl. halt for sequence governance), eval, embedding, pause, error, fallback, resilience, reliability, risk. Every event has a typed payload — gen_ai.chat, tool.start, cost.tick, composition.iteration_start, etc. The framework owns the loop and emits during DFS traversal — no instrumentation, no spans-after-the-fact.

Tier-3 — .enable.thinking() and .enable.logging()

Section titled “Tier-3 — .enable.thinking() and .enable.logging()”

The 80% case: a live status line (“what is the agent doing right now?”) and a domain-filtered log firehose. Both are one-liners.

examples/features/04-observability.ts (region: enable-thinking)
// Live status line — user-facing "what's the agent doing right now".
const stopThinking = agent.enable.thinking({
onStatus: (status) => console.log(`${status}`),
});

.enable.thinking({ onStatus }) gives you a Claude-Code-style live status line — one string per agent state transition. Use for terminal UIs, CLI progress, or chat-UI typing indicators.

examples/features/04-observability.ts (region: enable-logging)
// Firehose logging filtered to stream + agent domains. The logger
// object wraps console, pino, winston, etc. — any object with a
// `log(message, data?)` method.
const stopLogging = agent.enable.logging({
domains: [LoggingDomains.STREAM, LoggingDomains.AGENT],
logger: {
log: (message) => console.log(` [log] ${message}`),
},
});

.enable.logging({ domains, logger }) gives you structured logs filtered to specific event domains. The logger is any object with a log(message, data?) method — drop in console, pino, winston, or your own.

Both return a stop() function — call it on shutdown to detach the listener cleanly.

Tier-3 — .enable.observability() — grouped vendor strategies

Section titled “Tier-3 — .enable.observability() — grouped vendor strategies”

For shipping events to a vendor backend (CloudWatch, X-Ray, OTel, AgentCore), .enable.observability(...) accepts a typed strategy + optional detach driver. The strategy is the WHERE; the detach is the HOW (sync inline vs fire-and-forget):

import { agentcoreObservability } from 'agentfootprint/observability-providers';
import { microtaskBatchDriver } from 'footprintjs/detach';
const stop = agent.enable.observability({
strategy: agentcoreObservability({
region: 'us-east-1',
logGroupName: '/agentfootprint/my-agent',
}),
// Recommended for production — slow exporters never block the agent loop.
// The strategy's `exportEvent(event)` runs on the driver's schedule
// instead of inline on the dispatcher's hot path.
detach: { driver: microtaskBatchDriver, mode: 'forget' },
});
detach.modeReturnsUse when
omitted (no detach)syncStrategy is fast (< 100µs). Default, back-compat.
'forget' (default when detach set)voidPure fire-and-forget telemetry.
'join-later'onHandle(h) callback fires per eventYou want to await exports later (tests, backpressure).

Pick by environment — see footprintjs/detach for the full list. All exported from 'footprintjs/detach':

DriverBest for
microtaskBatchDriverDefault. Cross-runtime. Lowest latency.
setImmediateDriverNode only. Yields to I/O before flushing.
setTimeoutDriverCross-runtime, configurable delay.
sendBeaconDriverBrowser. Survives page-unload (analytics).
workerThreadDriverCPU-isolated. For heavy serialization.

flushAllDetached() (from 'footprintjs/detach') drains every in-flight handle process-wide:

import { flushAllDetached } from 'footprintjs/detach';
process.on('SIGTERM', async () => {
const stats = await flushAllDetached({ timeoutMs: 10_000 });
console.log(`Drained ${stats.done}, failed ${stats.failed}, pending ${stats.pending}`);
process.exit(stats.pending === 0 ? 0 : 1);
});

Vendor adapters — agentfootprint/observability-providers

Section titled “Vendor adapters — agentfootprint/observability-providers”

Vendor strategies ship under one grouped subpath (parallel-providers pattern, mirrors llm-providers / tool-providers / memory-providers):

AdapterStatusPeer dep (optional)
agentcoreObservability✅ shipped@aws-sdk/client-cloudwatch-logs
cloudwatchObservability✅ shipped@aws-sdk/client-cloudwatch-logs
xrayObservability✅ shipped@aws-sdk/client-xray
otelObservability✅ shipped@opentelemetry/api + @opentelemetry/sdk-node
datadogObservabilityroadmapdd-trace (use otelObservability with Datadog OTel collector for now)

All peer deps are declared optional — consumers who never call a particular factory don’t need to install its SDK. Each adapter lazy-imports its SDK at first use.

enable.cost(...) accepts the same detach option for the same reasons. enable.thinking and enable.lens deliberately stay sync — UI render must feel responsive.

Pass a pricingTable to the Agent constructor and the framework emits agentfootprint.cost.tick after every LLM call with per-call and cumulative USD. Add a costBudget and you also get a one-shot cost.limit_hit the first time cumulative crosses the budget. The library never auto-aborts — you decide what to do (cancel via signal, escalate, log, etc.):

examples/features/02-cost-tracking.ts (region: cost-tracking)
// 'feature' kind drives the smart tool-call flow. Cost ticks fire
// automatically off the per-iteration usage MockProvider estimates
// (chars/4) — sufficient to demo the budget crossing.
const agent = Agent.create({
provider: provider ?? exampleProvider('feature'),
model: 'demo-sonnet',
pricingTable: pricing,
costBudget: 0.0001, // trip the warning
})
.system('')
.tool({
schema: { name: 'noop', description: '', inputSchema: { type: 'object' } },
execute: () => 'ok',
})
.build();
agent.on('agentfootprint.cost.tick', (e) => {
const p = e.payload;
console.log(
`[tick] +$${p.estimatedUsd.toFixed(6)} — cumulative $${p.cumulative.estimatedUsd.toFixed(6)}`,
);
});
agentfootprint.<domain>.<event>
DomainSample eventsWhat it covers
agentturn_start · turn_end · iteration_start · iteration_end · route_decided · handoff · output_schema_validation_failedReAct loop boundaries; output_schema_validation_failed fires inside the reliability gate when the LLM’s final answer fails the agent’s outputSchema (v2.13+)
compositionenter · exit · iteration_start · iteration_exit · fork_start · branch_complete · merge_end · route_decidedSequence / Parallel / Conditional / Loop lifecycles
contextinjected · evaluated · evicted · slot_composed · budget_pressure · memoryContext engineering — every injection’s full lifecycle
streamllm_start · llm_end · token · tool_start · tool_endToken-by-token + tool-call lifecycle
toolsoffered · activated · deactivated · discovery_started · discovery_completed · discovery_failedToolProvider activation per iteration (skills, gating); async-provider lifecycle (start/complete/fail with durationMs)
skillactivated · deactivatedLLM-activated skill resolution
memoryattached · detached · written · strategy_appliedMemory pipeline subflows
cacheapplied · metricsProvider cache marker placement + hit-rate
costtick · limit_hitPer-call USD + budget threshold crossings
permissioncheck · gate_opened · gate_closed · haltPermissionChecker decisions; halt fires when a checker terminates the run via sequence governance (v2.12+)
evalscore · threshold_crossedCustom quality / guardrail signals
embeddinggeneratedEmbedder calls (RAG / semantic memory)
pauserequest · resumeHuman-in-the-loop checkpoints
errorfatal · recovered · retriedStage-level error events
fallbacktriggeredSchema-validation fallback chain
resiliencecircuit_state_changed · retry · fallback · output_fallback_triggered · output_canned_usedv2.10.x reliability primitives
reliabilityfail_fastv2.11.5 reliability gate fail-fast events
riskflaggedEval guardrails flagging suspect output

Every event is exhaustively typed via AgentfootprintEvent (a discriminated union); your IDE autocompletes the payload shape based on the event name. No string-parsing logs. No “I think this means the tool was called.”

For aggregations the built-ins don’t cover, write your own. A recorder is just an object with on* methods matching the event names:

const myRecorder = {
id: 'my-recorder',
onContextInjected: (e) => {
// e.payload is fully typed — { source, slot, role?, content?, ... }
},
onAgentTurnEnd: (e) => {
// e.payload — { iterationCount, totalInputTokens, totalOutputTokens, ... }
},
};
runner.attach(myRecorder);

The recorder discovers handler methods via runtime method-shape detection — implement only the events you care about; ignore the rest.

otelObservability ships in agentfootprint/observability-providers. It maps agentfootprint events onto OTel spans + attributes via your own @opentelemetry/api tracer. Pair with @opentelemetry/sdk-node and your favorite exporter (Jaeger, Tempo, Datadog OTel collector, etc.):

import { otelObservability } from 'agentfootprint/observability-providers';
import { trace } from '@opentelemetry/api';
const stop = agent.enable.observability({
strategy: otelObservability({
tracer: trace.getTracer('my-agent', '1.0.0'),
}),
detach: { driver: microtaskBatchDriver, mode: 'forget' },
});
  • Don’t log every event — 57+ typed events × N iterations × M agents = your log infra hates you. Filter to the domains you actually need.
  • Don’t subscribe in a hot looprunner.on(...) returns an unsubscribe; call it on shutdown. Forgotten subscriptions leak listeners across runs.
  • Don’t use getNarrative() as a log sink — it’s structured trace data, not log lines. Pass the entries to a real log/trace backend if you need to ship them.