Monitor

Streaming

Token-by-token output via provider.stream(). The same response is delivered as a stream for UI feedback AND as the authoritative LLMResponse for the ReAct loop — single round-trip, no double call.

A user types a question into your chat UI. They expect the answer to appear word-by-word like ChatGPT — not in one big block five seconds later. Behind the scenes the agent is also using that same response to decide whether to call a tool. One LLM call, two consumers: the UI streams tokens; the agent loop reads the final LLMResponse to drive ReAct decisioning.

How streaming works

Every LLM provider implements two methods:

MethodReturnsUse
complete(req)LLMResponseOne-shot, full response at end
stream(req)AsyncIterable<LLMChunk>Token-by-token chunks ending with a final chunk that carries the full LLMResponse

The agent calls stream() when streaming is active. Each chunk has tokenIndex, content, and done. The final chunk has done: true AND response: LLMResponse — the full authoritative shape with toolCalls, usage, stopReason. This single round-trip serves BOTH the UI's appetite for token-by-token feedback AND the agent loop's need for the structured response.

Listen to streamed events

Every runner exposes typed events. Subscribe to agentfootprint.stream.* for the streaming surface:

agent.on('agentfootprint.stream.tool_start', (e) =>  console.log(`→ tool ${e.payload.toolName}(${JSON.stringify(e.payload.args)})`),);agent.on('agentfootprint.stream.tool_end', (e) =>  console.log(`← tool result: ${e.payload.result}`),);

The event taxonomy:

  • stream.llm_start — provider call about to start (model, request shape)
  • stream.token — one token chunk arrived (content, tokenIndex)
  • stream.llm_end — final chunk processed (full response, usage, stopReason)
  • stream.tool_start — agent about to dispatch a tool (name, args)
  • stream.tool_end — tool returned (result or error)

SSE for browser clients

The fastest path is the built-in toSSE() helper from agentfootprint/stream. Hand it the runner and it yields SSE-formatted strings (event: <name>\ndata: <json>\n\n) until the run finishes — drop it into any HTTP framework that accepts an async-iterable body:

import { toSSE } from 'agentfootprint/stream';

// Express
app.post('/agent', async (req, res) => {
  res.setHeader('content-type', 'text/event-stream');
  // Run in parallel so events flow while the iterable drains.
  const running = agent.run({ message: req.body.message });
  for await (const chunk of toSSE(agent, {
    // token-only feed; omit `filter` to forward every event
    filter: (e) => e.type.startsWith('agentfootprint.stream.'),
  })) {
    res.write(chunk);
  }
  await running;
  res.end();
});

toSSE(runner, options?) accepts filter (skip events), format: 'full' | 'text' ('text' yields raw token content only — pipe straight into a chat UI), eventName (rename events), and heartbeatMs (keep-alive : ping comments for proxies). The class form new SSEFormatter(agent).stream() is identical. encodeSSE(name, payload) formats a one-off frame for app-level events outside the typed registry.

If you'd rather wire the events by hand, subscribe and write each frame yourself:

// Express / Hono / Fastify
agent.on('agentfootprint.stream.token', (e) => {
  res.write(`data: ${JSON.stringify({ type: 'token', content: e.payload.content })}\n\n`);
});
agent.on('agentfootprint.stream.tool_start', (e) => {
  res.write(`data: ${JSON.stringify({ type: 'tool', name: e.payload.toolName })}\n\n`);
});
agent.on('agentfootprint.stream.llm_end', () => {
  res.write(`event: done\ndata: {}\n\n`);
  res.end();
});

await agent.run({ message: userInput });

The browser consumes via EventSource and renders tokens into the UI as they arrive.

When streaming isn't useful

  • Cost-sensitive batch jobscomplete() is one less per-call cost than stream() on most providers (negligibly).
  • Tool-heavy ReAct loops — tool dispatch happens at stream.llm_end, not during token chunks. If your UI doesn't render tokens, complete() is simpler.
  • Tests / mocksmock({ reply: 'X' }) works in both modes; pick whichever the test asserts on.

Anti-patterns

  • Don't try to dispatch tools off stream.token. Tool calls land in the final chunk's response.toolCalls. Token chunks are content-only.
  • Don't keep the SSE connection open after stream.llm_end. End it; the browser's EventSource will reconnect automatically if needed.
  • Don't run two stream()s concurrently on the same agent. Build one, run one.

Next steps

On this page