Streaming

A user types a question into your chat UI. They expect the answer to appear word-by-word like ChatGPT — not in one big block five seconds later. Behind the scenes the agent is also using that same response to decide whether to call a tool. One LLM call, two consumers: the UI streams tokens; the agent loop reads the final LLMResponse to drive ReAct decisioning.

How streaming works

Every LLM provider implements two methods:

Method	Returns	Use
`complete(req)`	`LLMResponse`	One-shot, full response at end
`stream(req)`	`AsyncIterable<LLMChunk>`	Token-by-token chunks ending with a final chunk that carries the full LLMResponse

The agent calls stream() when streaming is active. Each chunk has tokenIndex, content, and done. The final chunk has done: true AND response: LLMResponse — the full authoritative shape with toolCalls, usage, stopReason. This single round-trip serves BOTH the UI’s appetite for token-by-token feedback AND the agent loop’s need for the structured response.

Listen to streamed events

Every runner exposes typed events. Subscribe to agentfootprint.stream.* for the streaming surface:

agent.on('agentfootprint.stream.tool_start', (e) =>
  console.log(`→ tool ${e.payload.toolName}(${JSON.stringify(e.payload.args)})`),
);
agent.on('agentfootprint.stream.tool_end', (e) =>
  console.log(`← tool result: ${e.payload.result}`),
);

The event taxonomy:

stream.llm_start — provider call about to start (model, request shape)
stream.token — one token chunk arrived (content, tokenIndex)
stream.llm_end — final chunk processed (full response, usage, stopReason)
stream.tool_start — agent about to dispatch a tool (name, args)
stream.tool_end — tool returned (result or error)

SSE for browser clients

Pipe stream events to Server-Sent Events for browser delivery. Wire your HTTP framework’s response writer to the events:

// Express / Hono / Fastify
agent.on('agentfootprint.stream.token', (e) => {
  res.write(`data: ${JSON.stringify({ type: 'token', content: e.payload.content })}\n\n`);
});
agent.on('agentfootprint.stream.tool_start', (e) => {
  res.write(`data: ${JSON.stringify({ type: 'tool', name: e.payload.toolName })}\n\n`);
});
agent.on('agentfootprint.stream.llm_end', () => {
  res.write(`event: done\ndata: {}\n\n`);
  res.end();
});

await agent.run({ message: userInput });

The browser consumes via EventSource and renders tokens into the UI as they arrive.

When streaming isn’t useful

Cost-sensitive batch jobs — complete() is one less per-call cost than stream() on most providers (negligibly).
Tool-heavy ReAct loops — tool dispatch happens at stream.llm_end, not during token chunks. If your UI doesn’t render tokens, complete() is simpler.
Tests / mocks — mock({ reply: 'X' }) works in both modes; pick whichever the test asserts on.

Anti-patterns

Don’t try to dispatch tools off stream.token. Tool calls land in the final chunk’s response.toolCalls. Token chunks are content-only.
Don’t keep the SSE connection open after stream.llm_end. End it; the browser’s EventSource will reconnect automatically if needed.
Don’t run two stream()s concurrently on the same agent. Build one, run one.

Next steps

Observability guide — the full 47-event taxonomy + recorders
Mock provider — mock({ reply }) streams word-by-word for $0 dev