Streaming
A user types a question into your chat UI. They expect the answer to appear word-by-word like ChatGPT — not in one big block five seconds later. Behind the scenes the agent is also using that same response to decide whether to call a tool. One LLM call, two consumers: the UI streams tokens; the agent loop reads the final LLMResponse to drive ReAct decisioning.
How streaming works
Section titled “How streaming works”Every LLM provider implements two methods:
| Method | Returns | Use |
|---|---|---|
complete(req) | LLMResponse | One-shot, full response at end |
stream(req) | AsyncIterable<LLMChunk> | Token-by-token chunks ending with a final chunk that carries the full LLMResponse |
The agent calls stream() when streaming is active. Each chunk has tokenIndex, content, and done. The final chunk has done: true AND response: LLMResponse — the full authoritative shape with toolCalls, usage, stopReason. This single round-trip serves BOTH the UI’s appetite for token-by-token feedback AND the agent loop’s need for the structured response.
Listen to streamed events
Section titled “Listen to streamed events”Every runner exposes typed events. Subscribe to agentfootprint.stream.* for the streaming surface:
agent.on('agentfootprint.stream.tool_start', (e) => console.log(`→ tool ${e.payload.toolName}(${JSON.stringify(e.payload.args)})`),);agent.on('agentfootprint.stream.tool_end', (e) => console.log(`← tool result: ${e.payload.result}`),);The event taxonomy:
stream.llm_start— provider call about to start (model, request shape)stream.token— one token chunk arrived (content, tokenIndex)stream.llm_end— final chunk processed (full response, usage, stopReason)stream.tool_start— agent about to dispatch a tool (name, args)stream.tool_end— tool returned (result or error)
SSE for browser clients
Section titled “SSE for browser clients”Pipe stream events to Server-Sent Events for browser delivery. Wire your HTTP framework’s response writer to the events:
// Express / Hono / Fastifyagent.on('agentfootprint.stream.token', (e) => { res.write(`data: ${JSON.stringify({ type: 'token', content: e.payload.content })}\n\n`);});agent.on('agentfootprint.stream.tool_start', (e) => { res.write(`data: ${JSON.stringify({ type: 'tool', name: e.payload.toolName })}\n\n`);});agent.on('agentfootprint.stream.llm_end', () => { res.write(`event: done\ndata: {}\n\n`); res.end();});
await agent.run({ message: userInput });The browser consumes via EventSource and renders tokens into the UI as they arrive.
When streaming isn’t useful
Section titled “When streaming isn’t useful”- Cost-sensitive batch jobs —
complete()is one less per-call cost thanstream()on most providers (negligibly). - Tool-heavy ReAct loops — tool dispatch happens at
stream.llm_end, not during token chunks. If your UI doesn’t render tokens,complete()is simpler. - Tests / mocks —
mock({ reply: 'X' })works in both modes; pick whichever the test asserts on.
Anti-patterns
Section titled “Anti-patterns”- Don’t try to dispatch tools off
stream.token. Tool calls land in the final chunk’sresponse.toolCalls. Token chunks are content-only. - Don’t keep the SSE connection open after
stream.llm_end. End it; the browser’s EventSource will reconnect automatically if needed. - Don’t run two
stream()s concurrently on the same agent. Build one, run one.
Next steps
Section titled “Next steps”- Observability guide — the full 47-event taxonomy + recorders
- Mock provider —
mock({ reply })streams word-by-word for $0 dev