Skip to content

Testing

A junior dev pushes a test that runs your agent in a for (let i = 0; i < 1000; i++) loop. They forgot to set provider: mock(...). Their CI run hits the production OpenAI key. You discover this when finance forwards a $40,000 invoice. The mocks-first development workflow makes this scenario impossible — by default, your agents run against deterministic mocks and you never accidentally hit a paid API in a test.

SurfaceWhat it mocks
mock({ reply }) / mock({ replies })LLM provider — single reply or scripted multi-turn
mockMcpClient({ tools })MCP server — in-memory, no SDK install
mockEmbedder()Embedder — deterministic vectors
InMemoryStoreMemory store — ephemeral, no Redis / Postgres / cloud
inline defineTool({ execute: async () => '...' })Tool implementations — pure closure, no real backend

Tool-using agents are notoriously hard to test because the LLM’s behavior is non-deterministic. mock({ replies }) solves that — each entry in the array is consumed in order on each iteration:

examples/features/07-mock-multi-turn-replies.ts (region: scripted-replies)
mock({
replies: [
{
toolCalls: [
{
id: 'call-1',
name: 'lookup',
args: { topic: 'refunds' } as Record<string, unknown>,
},
],
},
{ content: 'Refunds take 3 business days.' },
],
});

Iteration 1 calls the tool with the scripted args; iteration 2 returns the final answer. The tool actually executes; the agent loop is real. Only the LLM is mocked.

Exhaustion behavior: if the agent calls the LLM more times than there are entries in replies, the next call throws a clear error — “MockProvider exhausted: scripted N replies but received N+1” — so misnumbered scripts fail tests instead of silently looping forever.

provider.resetReplies() rewinds the cursor for cross-scenario reuse — useful when one test file runs many test cases against the same provider instance.

For agents that use MCP, mockMcpClient({ tools }) is a drop-in replacement for mcpClient(opts) — same McpClient interface, in-memory implementation. No subprocess, no network, no @modelcontextprotocol/sdk install needed:

import { mockMcpClient } from 'agentfootprint';
const mockServer = mockMcpClient({
tools: [
{
name: 'list_files',
description: 'List files in a directory',
inputSchema: { type: 'object' },
handler: async () => 'file1.txt\nfile2.txt',
},
],
});
const agent = Agent.create({ provider: mock({ reply: 'ok' }) })
.tools(await mockServer.tools())
.build();

When you ship to production, swap mockMcpClient for mcpClient — the rest of the agent code is identical.

InMemoryStore is the default memory adapter for tests. Ephemeral, isolated per-test, no infrastructure:

import { defineMemory, MEMORY_TYPES, MEMORY_STRATEGIES, InMemoryStore } from 'agentfootprint';
const memory = defineMemory({
id: 'test-window',
type: MEMORY_TYPES.EPISODIC,
strategy: { kind: MEMORY_STRATEGIES.WINDOW, size: 10 },
store: new InMemoryStore(),
});
// each test gets a fresh store via fresh definition

For tests that need to verify the store WAS called correctly (e.g., “did the agent persist this fact?”), use vitest spies on the store’s methods OR list entries directly via store.list(identity).

The project’s own examples are tested end-to-end in CI:

Terminal window
npm run test:examples # tsc + tsx run for every example
npm run test:examples:typecheck # type-check only (faster)

Every example file in examples/ runs against mocks (no API keys); CI proves they still compile + execute correctly on every commit. Pattern this for your own codebase: a mocks-first test suite is fast enough to run on every push.

  • Don’t mix mock and real providers in one test. If a test should hit a real API (integration test), mark it explicitly + isolate from the mocked test pool.
  • Don’t share a single mock() instance across tests without resetReplies(). Tests will pollute each other’s cursor state.
  • Don’t test agent logic by mocking the entire Agent class. Mock the BOUNDARIES (provider, store, MCP); let the real Agent loop run. That’s where bugs hide.