Pause / Resume

An agent processes a refund request. Mid-run, the LLM calls askOperator({ question: 'approve $500 refund?' }). The agent has to wait for a human — could be 30 seconds, could be 3 hours, could be tomorrow. You can’t keep the process running. The framework hands you back a JSON checkpoint; you persist it (Redis, Postgres, S3); when the human responds, you resume. Different process, different server — same conversation.

What “pausable” means here

Two halves:

Pause — a tool calls pauseHere(...) (or the agent uses the built-in askHuman tool); the framework throws a PauseRequest; the agent loop catches it; agent.run() returns a RunnerPauseOutcome containing a JSON-serializable checkpoint + the pause data.
Resume — your code persists the checkpoint anywhere; when the human’s reply is ready, agent.resume(checkpoint, humanAnswer) re-builds the state, returns the answer to the paused tool, and continues the agent loop from exactly where it stopped.

The checkpoint is JSON — no functions, no class instances, no closures. Cross-server safe.

A pausing tool

return Agent.create({
  provider: provider ?? exampleProvider('feature'),
  model: 'mock',
})
  .system('You process refunds. Use askOperator to request approval.')
  .tool({
    schema: {
      name: 'askOperator',
      description: 'Ask a human operator for approval.',
      inputSchema: {
        type: 'object',
        properties: { question: { type: 'string' } },
      },
    },
    execute: (args) => {
      const q = (args as { question: string }).question;
      // pauseHere throws a PauseRequest; the Agent catches it,
      // captures the checkpoint, and surfaces a RunnerPauseOutcome
      // up to whoever called .run().
      pauseHere({ question: q, severity: 'high' });
      return ''; // unreachable — pauseHere always throws
    },
  })
  .build();

pauseHere({ question, severity }) throws a special PauseRequest. The agent catches it, captures the checkpoint, and surfaces a RunnerPauseOutcome up to whoever called .run().

The tool’s execute looks like it never returns — that’s correct. pauseHere always throws. The “return” happens later via resume().

Process A → checkpoint → Process B

// Process A
const result = await agent.run({ message: 'refund order 123' });
if (isPaused(result)) {
  // result.checkpoint is JSON-serializable
  await db.save('pauses:' + sessionId, JSON.stringify(result.checkpoint));
  notifyHuman(result.pauseData);
  return; // process A is done
}

// Process B (later, different server, different day)
const checkpoint = JSON.parse(await db.get('pauses:' + sessionId));
const humanAnswer = { approved: true, amount: 500 };
const finalResult = await agent.resume(checkpoint, humanAnswer);
// finalResult is the agent's final string output

Build the agent fresh in Process B — same factory function, NOT the same instance. The checkpoint is the only thing that crosses the process boundary.

When to use pause/resume vs error vs callback

Situation	Use
Long-running approval workflow	pause/resume (this guide)
Synchronous tool error → LLM retries	tool throws; see Error handling
External webhook → trigger something	pause/resume + webhook handler calls `agent.resume()`
Background task fires multiple times	not an agent — use a queue + per-job agent

Anti-patterns

Don’t store closures in the checkpoint. The serializer rejects them. Build the agent fresh in Process B from the same factory.
Don’t pass the live agent instance across processes. Pass the checkpoint. The framework rebuilds state from it.
Don’t poll the agent for “is it paused yet?” — the result of .run() tells you. isPaused(result) is a typed predicate.

Next steps

Error handling — what’s recoverable via retry vs what needs human escalation
Security guide — permission-gated pauses for sensitive operations
v2.5 will add agent.resumeOnError(checkpoint, input) — auto-checkpointing on uncaught errors so any failure becomes resumable, not just intentional pauses. See the roadmap.