Streaming
Token-by-token output via provider.stream(). The same response is delivered as a stream for UI feedback AND as the authoritative LLMResponse for the ReAct loop — single round-trip, no double call.
A user types a question into your chat UI. They expect the answer to appear word-by-word like ChatGPT — not in one big block five seconds later. Behind the scenes the agent is also using that same response to decide whether to call a tool. One LLM call, two consumers: the UI streams tokens; the agent loop reads the final LLMResponse to drive ReAct decisioning.
How streaming works
Every LLM provider implements two methods:
| Method | Returns | Use |
|---|---|---|
complete(req) | LLMResponse | One-shot, full response at end |
stream(req) | AsyncIterable<LLMChunk> | Token-by-token chunks ending with a final chunk that carries the full LLMResponse |
The agent calls stream() when streaming is active. Each chunk has tokenIndex, content, and done. The final chunk has done: true AND response: LLMResponse — the full authoritative shape with toolCalls, usage, stopReason. This single round-trip serves BOTH the UI's appetite for token-by-token feedback AND the agent loop's need for the structured response.
Listen to streamed events
Every runner exposes typed events. Subscribe to agentfootprint.stream.* for the streaming surface:
agent.on('agentfootprint.stream.tool_start', (e) => console.log(`→ tool ${e.payload.toolName}(${JSON.stringify(e.payload.args)})`),);agent.on('agentfootprint.stream.tool_end', (e) => console.log(`← tool result: ${e.payload.result}`),);The event taxonomy:
stream.llm_start— provider call about to start (model, request shape)stream.token— one token chunk arrived (content, tokenIndex)stream.llm_end— final chunk processed (full response, usage, stopReason)stream.tool_start— agent about to dispatch a tool (name, args)stream.tool_end— tool returned (result or error)
SSE for browser clients
The fastest path is the built-in toSSE() helper from agentfootprint/stream. Hand it the runner and it yields SSE-formatted strings (event: <name>\ndata: <json>\n\n) until the run finishes — drop it into any HTTP framework that accepts an async-iterable body:
import { toSSE } from 'agentfootprint/stream';
// Express
app.post('/agent', async (req, res) => {
res.setHeader('content-type', 'text/event-stream');
// Run in parallel so events flow while the iterable drains.
const running = agent.run({ message: req.body.message });
for await (const chunk of toSSE(agent, {
// token-only feed; omit `filter` to forward every event
filter: (e) => e.type.startsWith('agentfootprint.stream.'),
})) {
res.write(chunk);
}
await running;
res.end();
});toSSE(runner, options?) accepts filter (skip events), format: 'full' | 'text' ('text' yields raw token content only — pipe straight into a chat UI), eventName (rename events), and heartbeatMs (keep-alive : ping comments for proxies). The class form new SSEFormatter(agent).stream() is identical. encodeSSE(name, payload) formats a one-off frame for app-level events outside the typed registry.
If you'd rather wire the events by hand, subscribe and write each frame yourself:
// Express / Hono / Fastify
agent.on('agentfootprint.stream.token', (e) => {
res.write(`data: ${JSON.stringify({ type: 'token', content: e.payload.content })}\n\n`);
});
agent.on('agentfootprint.stream.tool_start', (e) => {
res.write(`data: ${JSON.stringify({ type: 'tool', name: e.payload.toolName })}\n\n`);
});
agent.on('agentfootprint.stream.llm_end', () => {
res.write(`event: done\ndata: {}\n\n`);
res.end();
});
await agent.run({ message: userInput });The browser consumes via EventSource and renders tokens into the UI as they arrive.
When streaming isn't useful
- Cost-sensitive batch jobs —
complete()is one less per-call cost thanstream()on most providers (negligibly). - Tool-heavy ReAct loops — tool dispatch happens at
stream.llm_end, not during token chunks. If your UI doesn't render tokens,complete()is simpler. - Tests / mocks —
mock({ reply: 'X' })works in both modes; pick whichever the test asserts on.
Anti-patterns
- Don't try to dispatch tools off
stream.token. Tool calls land in the final chunk'sresponse.toolCalls. Token chunks are content-only. - Don't keep the SSE connection open after
stream.llm_end. End it; the browser's EventSource will reconnect automatically if needed. - Don't run two
stream()s concurrently on the same agent. Build one, run one.
Next steps
- Observability guide — the full 59-event taxonomy + recorders
- Quick start —
mock({ reply })streams word-by-word for $0 dev
Context engineering recorder
Filter the firehose of context.injected events into engineered (RAG / Skills / Memory / Instructions / Steering / Facts) vs baseline (user / tool-result / assistant). The first-class handle on what your context engineering is actually doing.
Locales (Message Catalog Pattern)
Ship the agent's voice as a locale pack. defaultCommentaryMessages + defaultThinkingMessages + composeMessages + validateMessages — the i18n surface for agentfootprint observability prose.
