Deployment

Friday afternoon, 4:50 PM. You’re about to push your agent service to production for the first time. What’s the checklist? Not “did the tests pass” (CI proved that) — the higher-stakes things: tenant isolation, peer-dep installed, observability hooks wired, secrets out of the codebase. This guide is the checklist.

The mocks-first → prod-swap workflow

If you developed against the mocks-first stack, the deployment is mechanical:

Boundary	Dev (mock)	Prod (one-line swap)
LLM provider	`mock({ reply })`	`anthropic({...})` · `openai({...})` · `bedrock({...})`
Embedder	`mockEmbedder()`	OpenAI / Cohere / Bedrock embedder factory (planned v2.6)
Memory store	`InMemoryStore`	`RedisStore` (`agentfootprint/memory-redis`) · `AgentCoreStore` (`agentfootprint/memory-agentcore`)
MCP server	`mockMcpClient({ tools })`	`mcpClient({ transport })`
Tool execute	inline closure	real implementation

The flowchart, recorders, narrative, tests don’t change. Ship the patterns first; pay for tokens last.

Production checklist

1. Multi-tenant identity at every `agent.run()`

Every memory call namespaces by MemoryIdentity. The default { conversationId: '_global' } is fine for prototypes — DANGEROUS in production multi-tenant apps. Pass per-tenant identity at every call site:

const identity = {
  tenant: req.tenantId,
  principal: req.userId,
  conversationId: req.threadId,
};
await agent.run({ message: req.body.message, identity });

A bug that omits tenant surfaces as “no data found” — never as a cross-tenant leak. Adapters refuse cross-tenant reads at the storage boundary.

2. Peer-dep SDKs installed

agentfootprint declares optional SDKs in peerDependenciesMeta. Install only what you use:

npm install agentfootprint footprintjs   # core (always)
npm install @anthropic-ai/sdk            # if using anthropic()
npm install openai                       # if using openai()
npm install @aws-sdk/client-bedrock-runtime              # if using bedrock()
npm install ioredis                                      # if using RedisStore
npm install @aws-sdk/client-bedrock-agent-runtime        # if using AgentCoreStore
npm install @modelcontextprotocol/sdk                    # if using mcpClient()

Lazy-required at first call with friendly install hints. npm install agentfootprint on its own works for mocks-first dev.

3. Secrets in env, not code

Provider factories accept apiKey as a constructor arg — pass via process.env:

const provider = anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });

Never commit keys. Use a secrets manager (AWS Secrets Manager, HashiCorp Vault, Doppler, etc.) for production.

4. Observability hooks wired

Pick one per concern:

Status line — agent.enable.thinking({ onStatus }) (terminal UIs, chat typing indicators)
Structured logs — agent.enable.logging({ domains, logger }) (pino, winston, console)
Cost tracking — pass pricingTable + costBudget to Agent.create()
Custom recorders — agent.attach(myRecorder) for aggregation

See Observability guide for the full surface.

5. Resilience decorators wrapping the provider

Production providers should be wrapped:

const provider = withRetry(
  withFallback(
    anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! }),
    openai({ apiKey: process.env.OPENAI_API_KEY! }),
  ),
  { maxAttempts: 5 },
);

See Resilience guide for withRetry / withFallback / resilientProvider decorators.

6. Pause/resume infrastructure

If your agent uses pauseHere or askHuman, wire the checkpoint persistence + resume path:

// On pause
const result = await agent.run({...});
if (isPaused(result)) {
  await db.save('pause:' + sessionId, JSON.stringify(result.checkpoint));
  triggerHumanWorkflow(result.pauseData);
  return; // request done
}

// On human reply (different process / day)
const checkpoint = JSON.parse(await db.get('pause:' + sessionId));
const finalResult = await agent.resume(checkpoint, humanAnswer);

See Pause/Resume guide.

Multi-instance considerations

CircuitBreaker state is per-process today. Multi-instance deploys won’t share circuit state. Acceptable for v2.4; v2.5 adds shared-state options. (Reliability subsystem.)
Memory stores are inherently multi-instance friendly when backed by Redis / AgentCore / external DBs.
Pause/resume checkpoints are JSON; any instance can resume any pause given the checkpoint.

What’s NOT here yet

Rate-limit budget enforcement → Reliability subsystem v2.5
3-tier output fallback → Reliability subsystem v2.5
Per-agent IAM-style policy → Governance subsystem v2.6
DynamoDB / Postgres / Pinecone memory adapters → v2.6

Anti-patterns

Don’t wire production keys into your test suite. Use mock(...) for tests; reserve real keys for staged env.
Don’t omit identity in production. Default-global is a footgun.
Don’t cache the agent instance across requests when memory is involved. Build fresh per request; the memory layer handles state.

Next steps

Quick Start — the mocks-first workflow
Memory store adapters — production backend matrix
Resilience guide — production-grade decorators