Testing

A junior dev pushes a test that runs your agent in a for (let i = 0; i < 1000; i++) loop. They forgot to set provider: mock(...). Their CI run hits the production OpenAI key. You discover this when finance forwards a $40,000 invoice. The mocks-first development workflow makes this scenario impossible — by default, your agents run against deterministic mocks and you never accidentally hit a paid API in a test.

Three mock surfaces, three boundaries

Surface	What it mocks
`mock({ reply })` / `mock({ replies })`	LLM provider — single reply or scripted multi-turn
`mockMcpClient({ tools })`	MCP server — in-memory, no SDK install
`mockEmbedder()`	Embedder — deterministic vectors
`InMemoryStore`	Memory store — ephemeral, no Redis / Postgres / cloud
inline `defineTool({ execute: async () => '...' })`	Tool implementations — pure closure, no real backend

Scripted multi-turn agent test

Tool-using agents are notoriously hard to test because the LLM’s behavior is non-deterministic. mock({ replies }) solves that — each entry in the array is consumed in order on each iteration:

mock({
  replies: [
    {
      toolCalls: [
        {
          id: 'call-1',
          name: 'lookup',
          args: { topic: 'refunds' } as Record<string, unknown>,
        },
      ],
    },
    { content: 'Refunds take 3 business days.' },
  ],
});

Iteration 1 calls the tool with the scripted args; iteration 2 returns the final answer. The tool actually executes; the agent loop is real. Only the LLM is mocked.

Exhaustion behavior: if the agent calls the LLM more times than there are entries in replies, the next call throws a clear error — “MockProvider exhausted: scripted N replies but received N+1” — so misnumbered scripts fail tests instead of silently looping forever.

provider.resetReplies() rewinds the cursor for cross-scenario reuse — useful when one test file runs many test cases against the same provider instance.

Mocking MCP without subprocess

For agents that use MCP, mockMcpClient({ tools }) is a drop-in replacement for mcpClient(opts) — same McpClient interface, in-memory implementation. No subprocess, no network, no @modelcontextprotocol/sdk install needed:

import { mockMcpClient } from 'agentfootprint';

const mockServer = mockMcpClient({
  tools: [
    {
      name: 'list_files',
      description: 'List files in a directory',
      inputSchema: { type: 'object' },
      handler: async () => 'file1.txt\nfile2.txt',
    },
  ],
});

const agent = Agent.create({ provider: mock({ reply: 'ok' }) })
  .tools(await mockServer.tools())
  .build();

When you ship to production, swap mockMcpClient for mcpClient — the rest of the agent code is identical.

Memory tests with InMemoryStore

InMemoryStore is the default memory adapter for tests. Ephemeral, isolated per-test, no infrastructure:

import { defineMemory, MEMORY_TYPES, MEMORY_STRATEGIES, InMemoryStore } from 'agentfootprint';

const memory = defineMemory({
  id: 'test-window',
  type: MEMORY_TYPES.EPISODIC,
  strategy: { kind: MEMORY_STRATEGIES.WINDOW, size: 10 },
  store: new InMemoryStore(),
});

// each test gets a fresh store via fresh definition

For tests that need to verify the store WAS called correctly (e.g., “did the agent persist this fact?”), use vitest spies on the store’s methods OR list entries directly via store.list(identity).

CI integration

The project’s own examples are tested end-to-end in CI:

npm run test:examples           # tsc + tsx run for every example
npm run test:examples:typecheck # type-check only (faster)

Every example file in examples/ runs against mocks (no API keys); CI proves they still compile + execute correctly on every commit. Pattern this for your own codebase: a mocks-first test suite is fast enough to run on every push.

Anti-patterns

Don’t mix mock and real providers in one test. If a test should hit a real API (integration test), mark it explicitly + isolate from the mocked test pool.
Don’t share a single mock() instance across tests without resetReplies(). Tests will pollute each other’s cursor state.
Don’t test agent logic by mocking the entire Agent class. Mock the BOUNDARIES (provider, store, MCP); let the real Agent loop run. That’s where bugs hide.

Next steps

Quick Start § mocks-first — the workflow recommendation
MCP integration — mockMcpClient shape + production swap
Memory guide — InMemoryStore for ephemeral test scenarios