Skip to content

Narrative memory (Summarization)

Your support agent is on turn 47 of a single conversation. The user is debugging an integration issue and the back-and-forth has been going for 90 minutes. The context window is filling up with detail that’s no longer relevant — the LLM is one tool call away from losing the thread. The fix isn’t a bigger context window — it’s compressing the older turns into a summary so the LLM keeps the BEAT of the conversation without the noise.

defineMemory({ type: EPISODIC, strategy: { kind: SUMMARIZE, recent, llm } }) — keeps the last N turns raw, runs older turns through a cheap LLM summarizer, persists the summary as a “beat” for future reads.

The recent parameter sets the boundary. Turns inside the window are stored as raw messages. Turns past it are compressed into one summary per write batch. On reads, the messages slot gets: [summary of older turns] + [last N raw turns].

Cheaper than infinite raw retention. More semantic than truncation.

examples/memory/03-summarize-strategy.ts (region: define)
const memory = defineMemory({
id: 'long-chat',
type: MEMORY_TYPES.EPISODIC,
strategy: {
kind: MEMORY_STRATEGIES.SUMMARIZE,
recent: 6, // keep last 6 turns raw, summarize older
llm: summarizer, // dedicated cheap model for compression
},
store,
});

The llm parameter is the dedicated summarizer model. Use a cheap one — Haiku, GPT-4o-mini, or any local model. The summarizer doesn’t need to be smart, it needs to be FAST and CHEAP because it runs on every memory write.

The write subflow runs after every successful turn. Each invocation:

  1. Loads the current entries from the store
  2. If count > recent, takes the OLDEST (count - recent) turns
  3. Sends them to the summarizer LLM with a fixed compression prompt
  4. Replaces the older turns with one summary entry tagged kind: 'summary'
  5. Keeps the recent turns unchanged

The summary is itself an entry in the store, so future writes summarize the previous summary + new older turns into a fresh summary. The conversation BEATS get progressively coarser the further back you look — exactly like human episodic memory.

SituationStrategy
Short chats (under window size)WINDOW (no compression needed)
Long chats where the BEAT matters more than the detailSUMMARIZE (this guide)
Long chats where you want STRUCTURED facts, not narrativeEXTRACT
Cross-run “what did we settle on last time?”NARRATIVE type (different shape) or CAUSAL type (decision evidence)

Every read and write scopes by MemoryIdentity. A summary belongs to one (tenant, principal, conversationId) tuple — no cross-tenant leakage even if the same store backs many tenants.

  • Don’t use the production model as the summarizer. Cost will dominate. Use a cheap model dedicated to compression.
  • Don’t summarize every turn. The library already batches — summarization fires only when the over-recent count justifies it.
  • Don’t summarize facts. Use SEMANTIC × EXTRACT for facts. SUMMARIZE compresses NARRATIVE flow; EXTRACT distills DATA.