Skip to content

Auto memory (Hybrid)

A customer comes back after a week and asks “what did we settle on?” Your agent needs three things at once: the last few messages of THIS conversation (recent), the extracted facts about THIS customer (semantic), and the decision evidence from past loan reviews (causal). One memory type can’t do all three. The hybrid pattern stacks them.

The defineMemory factory returns ONE memory definition. Real production agents need MULTIPLE — different types for different time horizons:

LayerType × StrategyTime horizonCost
Recent windowEPISODIC × WINDOWLast N turns of THIS conversationFree
Extracted factsSEMANTIC × EXTRACTAll known facts about this userOne LLM call per write
Causal snapshotsCAUSAL × TOP_KDecision evidence from past runsEmbedding call per query

Stack them via multiple .memory(...) calls on the same agent. Each layer’s read subflow runs independently and contributes to the messages slot:

examples/memory/07-hybrid-auto.ts (region: hybrid-stack)
// 1. Short-term: last 10 turns (cheap, fast)
const recent = defineMemory({
id: 'recent',
type: MEMORY_TYPES.EPISODIC,
strategy: { kind: MEMORY_STRATEGIES.WINDOW, size: 10 },
store: recentStore,
});
// 2. Semantic facts: pattern-extracted, recency-loaded
const facts = defineMemory({
id: 'facts',
type: MEMORY_TYPES.SEMANTIC,
strategy: {
kind: MEMORY_STRATEGIES.EXTRACT,
extractor: 'pattern',
maxPerTurn: 5,
},
store: factsStore,
});
// 3. Causal: snapshots of past runs, retrieved by semantic match
const causal = defineMemory({
id: 'causal',
type: MEMORY_TYPES.CAUSAL,
strategy: {
kind: MEMORY_STRATEGIES.TOP_K,
topK: 1,
threshold: 0.5,
embedder,
},
store: causalStore,
});

The agent sees a layered context: short-term turns (most recent), extracted facts (always-relevant), and the matching past snapshot (when cosine clears the threshold). Each layer can use its OWN store — recentStore could be Redis-hot, factsStore Postgres, causalStore S3+pgvector — each tuned to the layer’s read-frequency vs durability needs.

Different parts of “what the agent knows” have different update rates and retention needs:

  • Recent window turns over every minute (every conversation turn).
  • Extracted facts turn over every day (new facts learned about the user).
  • Causal snapshots turn over every quarter (every meaningful decision the agent made).

A single store optimized for one cadence is wrong for the other two. RedisStore is great for recent (sub-ms latency, TTL). Postgres is great for facts (queryable, joinable). pgvector / Pinecone is great for causal (vector search). The hybrid pattern lets each layer use the right backend.

Multiple memories on the same agent each get their own scope key — memoryInjection_${id} — so they layer cleanly. The unique IDs (recent, facts, causal above) become observability event identifiers; you can filter agentfootprint.context.injected by e.payload.source === 'memory:facts' to see exactly which layer fired when.

  • Don’t share one store across layers — defeats the per-layer backend tuning. Use one store per layer, each adapter chosen for that layer’s access pattern.
  • Don’t mix incompatible strategies on one type — EXTRACT writes structured facts; WINDOW reads raw messages. Mixing them on one definition produces nonsense.
  • Don’t forget identity. All three layers scope by the same MemoryIdentity tuple — a single agent.run({ identity }) propagates to all of them.