Skip to content

Pause / Resume

An agent processes a refund request. Mid-run, the LLM calls askOperator({ question: 'approve $500 refund?' }). The agent has to wait for a human — could be 30 seconds, could be 3 hours, could be tomorrow. You can’t keep the process running. The framework hands you back a JSON checkpoint; you persist it (Redis, Postgres, S3); when the human responds, you resume. Different process, different server — same conversation.

Two halves:

  1. Pause — a tool calls pauseHere(...) (or the agent uses the built-in askHuman tool); the framework throws a PauseRequest; the agent loop catches it; agent.run() returns a RunnerPauseOutcome containing a JSON-serializable checkpoint + the pause data.
  2. Resume — your code persists the checkpoint anywhere; when the human’s reply is ready, agent.resume(checkpoint, humanAnswer) re-builds the state, returns the answer to the paused tool, and continues the agent loop from exactly where it stopped.

The checkpoint is JSON — no functions, no class instances, no closures. Cross-server safe.

examples/features/01-pause-resume.ts (region: pause-tool)
return Agent.create({
provider: provider ?? exampleProvider('feature'),
model: 'mock',
})
.system('You process refunds. Use askOperator to request approval.')
.tool({
schema: {
name: 'askOperator',
description: 'Ask a human operator for approval.',
inputSchema: {
type: 'object',
properties: { question: { type: 'string' } },
},
},
execute: (args) => {
const q = (args as { question: string }).question;
// pauseHere throws a PauseRequest; the Agent catches it,
// captures the checkpoint, and surfaces a RunnerPauseOutcome
// up to whoever called .run().
pauseHere({ question: q, severity: 'high' });
return ''; // unreachable — pauseHere always throws
},
})
.build();

pauseHere({ question, severity }) throws a special PauseRequest. The agent catches it, captures the checkpoint, and surfaces a RunnerPauseOutcome up to whoever called .run().

The tool’s execute looks like it never returns — that’s correct. pauseHere always throws. The “return” happens later via resume().

// Process A
const result = await agent.run({ message: 'refund order 123' });
if (isPaused(result)) {
// result.checkpoint is JSON-serializable
await db.save('pauses:' + sessionId, JSON.stringify(result.checkpoint));
notifyHuman(result.pauseData);
return; // process A is done
}
// Process B (later, different server, different day)
const checkpoint = JSON.parse(await db.get('pauses:' + sessionId));
const humanAnswer = { approved: true, amount: 500 };
const finalResult = await agent.resume(checkpoint, humanAnswer);
// finalResult is the agent's final string output

Build the agent fresh in Process B — same factory function, NOT the same instance. The checkpoint is the only thing that crosses the process boundary.

When to use pause/resume vs error vs callback

Section titled “When to use pause/resume vs error vs callback”
SituationUse
Long-running approval workflowpause/resume (this guide)
Synchronous tool error → LLM retriestool throws; see Error handling
External webhook → trigger somethingpause/resume + webhook handler calls agent.resume()
Background task fires multiple timesnot an agent — use a queue + per-job agent
  • Don’t store closures in the checkpoint. The serializer rejects them. Build the agent fresh in Process B from the same factory.
  • Don’t pass the live agent instance across processes. Pass the checkpoint. The framework rebuilds state from it.
  • Don’t poll the agent for “is it paused yet?” — the result of .run() tells you. isPaused(result) is a typed predicate.
  • Error handling — what’s recoverable via retry vs what needs human escalation
  • Security guide — permission-gated pauses for sensitive operations
  • v2.5 will add agent.resumeOnError(checkpoint, input) — auto-checkpointing on uncaught errors so any failure becomes resumable, not just intentional pauses. See the roadmap.