Testing
A junior dev pushes a test that runs your agent in a
for (let i = 0; i < 1000; i++)loop. They forgot to setprovider: mock(...). Their CI run hits the production OpenAI key. You discover this when finance forwards a $40,000 invoice. The mocks-first development workflow makes this scenario impossible — by default, your agents run against deterministic mocks and you never accidentally hit a paid API in a test.
Three mock surfaces, three boundaries
Section titled “Three mock surfaces, three boundaries”| Surface | What it mocks |
|---|---|
mock({ reply }) / mock({ replies }) | LLM provider — single reply or scripted multi-turn |
mockMcpClient({ tools }) | MCP server — in-memory, no SDK install |
mockEmbedder() | Embedder — deterministic vectors |
InMemoryStore | Memory store — ephemeral, no Redis / Postgres / cloud |
inline defineTool({ execute: async () => '...' }) | Tool implementations — pure closure, no real backend |
Scripted multi-turn agent test
Section titled “Scripted multi-turn agent test”Tool-using agents are notoriously hard to test because the LLM’s behavior is non-deterministic. mock({ replies }) solves that — each entry in the array is consumed in order on each iteration:
mock({ replies: [ { toolCalls: [ { id: 'call-1', name: 'lookup', args: { topic: 'refunds' } as Record<string, unknown>, }, ], }, { content: 'Refunds take 3 business days.' }, ],});Iteration 1 calls the tool with the scripted args; iteration 2 returns the final answer. The tool actually executes; the agent loop is real. Only the LLM is mocked.
Exhaustion behavior: if the agent calls the LLM more times than there are entries in replies, the next call throws a clear error — “MockProvider exhausted: scripted N replies but received N+1” — so misnumbered scripts fail tests instead of silently looping forever.
provider.resetReplies() rewinds the cursor for cross-scenario reuse — useful when one test file runs many test cases against the same provider instance.
Mocking MCP without subprocess
Section titled “Mocking MCP without subprocess”For agents that use MCP, mockMcpClient({ tools }) is a drop-in replacement for mcpClient(opts) — same McpClient interface, in-memory implementation. No subprocess, no network, no @modelcontextprotocol/sdk install needed:
import { mockMcpClient } from 'agentfootprint';
const mockServer = mockMcpClient({ tools: [ { name: 'list_files', description: 'List files in a directory', inputSchema: { type: 'object' }, handler: async () => 'file1.txt\nfile2.txt', }, ],});
const agent = Agent.create({ provider: mock({ reply: 'ok' }) }) .tools(await mockServer.tools()) .build();When you ship to production, swap mockMcpClient for mcpClient — the rest of the agent code is identical.
Memory tests with InMemoryStore
Section titled “Memory tests with InMemoryStore”InMemoryStore is the default memory adapter for tests. Ephemeral, isolated per-test, no infrastructure:
import { defineMemory, MEMORY_TYPES, MEMORY_STRATEGIES, InMemoryStore } from 'agentfootprint';
const memory = defineMemory({ id: 'test-window', type: MEMORY_TYPES.EPISODIC, strategy: { kind: MEMORY_STRATEGIES.WINDOW, size: 10 }, store: new InMemoryStore(),});
// each test gets a fresh store via fresh definitionFor tests that need to verify the store WAS called correctly (e.g., “did the agent persist this fact?”), use vitest spies on the store’s methods OR list entries directly via store.list(identity).
CI integration
Section titled “CI integration”The project’s own examples are tested end-to-end in CI:
npm run test:examples # tsc + tsx run for every examplenpm run test:examples:typecheck # type-check only (faster)Every example file in examples/ runs against mocks (no API keys); CI proves they still compile + execute correctly on every commit. Pattern this for your own codebase: a mocks-first test suite is fast enough to run on every push.
Anti-patterns
Section titled “Anti-patterns”- Don’t mix mock and real providers in one test. If a test should hit a real API (integration test), mark it explicitly + isolate from the mocked test pool.
- Don’t share a single
mock()instance across tests withoutresetReplies(). Tests will pollute each other’s cursor state. - Don’t test agent logic by mocking the entire
Agentclass. Mock the BOUNDARIES (provider, store, MCP); let the real Agent loop run. That’s where bugs hide.
Next steps
Section titled “Next steps”- Quick Start § mocks-first — the workflow recommendation
- MCP integration —
mockMcpClientshape + production swap - Memory guide —
InMemoryStorefor ephemeral test scenarios