Skip to content

Streaming

A user types a question into your chat UI. They expect the answer to appear word-by-word like ChatGPT — not in one big block five seconds later. Behind the scenes the agent is also using that same response to decide whether to call a tool. One LLM call, two consumers: the UI streams tokens; the agent loop reads the final LLMResponse to drive ReAct decisioning.

Every LLM provider implements two methods:

MethodReturnsUse
complete(req)LLMResponseOne-shot, full response at end
stream(req)AsyncIterable<LLMChunk>Token-by-token chunks ending with a final chunk that carries the full LLMResponse

The agent calls stream() when streaming is active. Each chunk has tokenIndex, content, and done. The final chunk has done: true AND response: LLMResponse — the full authoritative shape with toolCalls, usage, stopReason. This single round-trip serves BOTH the UI’s appetite for token-by-token feedback AND the agent loop’s need for the structured response.

Every runner exposes typed events. Subscribe to agentfootprint.stream.* for the streaming surface:

examples/core/02-agent-with-tools.ts (region: observe)
agent.on('agentfootprint.stream.tool_start', (e) =>
console.log(`→ tool ${e.payload.toolName}(${JSON.stringify(e.payload.args)})`),
);
agent.on('agentfootprint.stream.tool_end', (e) =>
console.log(`← tool result: ${e.payload.result}`),
);

The event taxonomy:

  • stream.llm_start — provider call about to start (model, request shape)
  • stream.token — one token chunk arrived (content, tokenIndex)
  • stream.llm_end — final chunk processed (full response, usage, stopReason)
  • stream.tool_start — agent about to dispatch a tool (name, args)
  • stream.tool_end — tool returned (result or error)

Pipe stream events to Server-Sent Events for browser delivery. Wire your HTTP framework’s response writer to the events:

// Express / Hono / Fastify
agent.on('agentfootprint.stream.token', (e) => {
res.write(`data: ${JSON.stringify({ type: 'token', content: e.payload.content })}\n\n`);
});
agent.on('agentfootprint.stream.tool_start', (e) => {
res.write(`data: ${JSON.stringify({ type: 'tool', name: e.payload.toolName })}\n\n`);
});
agent.on('agentfootprint.stream.llm_end', () => {
res.write(`event: done\ndata: {}\n\n`);
res.end();
});
await agent.run({ message: userInput });

The browser consumes via EventSource and renders tokens into the UI as they arrive.

  • Cost-sensitive batch jobscomplete() is one less per-call cost than stream() on most providers (negligibly).
  • Tool-heavy ReAct loops — tool dispatch happens at stream.llm_end, not during token chunks. If your UI doesn’t render tokens, complete() is simpler.
  • Tests / mocksmock({ reply: 'X' }) works in both modes; pick whichever the test asserts on.
  • Don’t try to dispatch tools off stream.token. Tool calls land in the final chunk’s response.toolCalls. Token chunks are content-only.
  • Don’t keep the SSE connection open after stream.llm_end. End it; the browser’s EventSource will reconnect automatically if needed.
  • Don’t run two stream()s concurrently on the same agent. Build one, run one.