Monitor

Pause / Resume

Human-in-the-loop with JSON-checkpointed state. Pause hours mid-run via askHuman or pauseHere; resume on a different process, day, or server.

An agent processes a refund request. Mid-run, the LLM calls askOperator({ question: 'approve $500 refund?' }). The agent has to wait for a human — could be 30 seconds, could be 3 hours, could be tomorrow. You can't keep the process running. The framework hands you back a JSON checkpoint; you persist it (Redis, Postgres, S3); when the human responds, you resume. Different process, different server — same conversation.

What "pausable" means here

Two halves:

  1. Pause — a tool calls pauseHere(...) (or the agent uses the built-in askHuman tool); the framework throws a PauseRequest; the agent loop catches it; agent.run() returns a RunnerPauseOutcome containing a JSON-serializable checkpoint + the pause data.
  2. Resume — your code persists the checkpoint anywhere; when the human's reply is ready, agent.resume(checkpoint, humanAnswer) re-builds the state, returns the answer to the paused tool, and continues the agent loop from exactly where it stopped.

The checkpoint is JSON — no functions, no class instances, no closures. Cross-server safe.

A pausing tool

return Agent.create({  provider: provider ?? exampleProvider('feature'),  model: 'mock',})  .system('You process refunds. Use askOperator to request approval.')  .tool({    schema: {      name: 'askOperator',      description: 'Ask a human operator for approval.',      inputSchema: {        type: 'object',        properties: { question: { type: 'string' } },      },    },    execute: (args) => {      const q = (args as { question: string }).question;      // pauseHere throws a PauseRequest; the Agent catches it,      // captures the checkpoint, and surfaces a RunnerPauseOutcome      // up to whoever called .run().      pauseHere({ question: q, severity: 'high' });      return ''; // unreachable — pauseHere always throws    },  })  .build();

pauseHere({ question, severity }) throws a special PauseRequest. The agent catches it, captures the checkpoint, and surfaces a RunnerPauseOutcome up to whoever called .run().

The tool's execute looks like it never returns — that's correct. pauseHere always throws. The "return" happens later via resume().

Process A → checkpoint → Process B

// Process A
const result = await agent.run({ message: 'refund order 123' });
if (isPaused(result)) {
  // result.checkpoint is JSON-serializable
  await db.save('pauses:' + sessionId, JSON.stringify(result.checkpoint));
  notifyHuman(result.pauseData);
  return; // process A is done
}

// Process B (later, different server, different day)
const checkpoint = JSON.parse(await db.get('pauses:' + sessionId));
const humanAnswer = { approved: true, amount: 500 };
const finalResult = await agent.resume(checkpoint, humanAnswer);
// finalResult is the agent's final string output

Build the agent fresh in Process B — same factory function, NOT the same instance. The checkpoint is the only thing that crosses the process boundary.

When to use pause/resume vs error vs callback

SituationUse
Long-running approval workflowpause/resume (this guide)
Synchronous tool error → LLM retriestool throws; see Error handling
External webhook → trigger somethingpause/resume + webhook handler calls agent.resume()
Background task fires multiple timesnot an agent — use a queue + per-job agent

Anti-patterns

  • Don't store closures in the checkpoint. The serializer rejects them. Build the agent fresh in Process B from the same factory.
  • Don't pass the live agent instance across processes. Pass the checkpoint. The framework rebuilds state from it.
  • Don't poll the agent for "is it paused yet?" — the result of .run() tells you. isPaused(result) is a typed predicate.

Next steps

  • Error handling — what's recoverable via retry vs what needs human escalation
  • Security guide — permission-gated pauses for sensitive operations
  • Reliability guideagent.resumeOnError(checkpoint) auto-checkpoints on an uncaught mid-run error (the failure throws a RunCheckpointError carrying a JSON-serializable checkpoint), so any failure becomes resumable, not just intentional pauses.

On this page