Skip to content

Semantic retrieval (Top-K)

A user asks “how do I cancel my subscription?” on turn 1. On turn 47 they ask the same thing differently — “can I get out of the plan?” You’d want the SAME context to load both times, even though the wording differs. Semantic retrieval is what makes that possible: embed both queries, find the same documents in vector space.

defineMemory({ type: SEMANTIC, strategy: { kind: TOP_K, topK, threshold, embedder } }) — every read embeds the current query, cosine-searches the vector store, returns the top-K most-similar entries that meet the threshold.

The read subflow:

  1. Embed the current userMessage into a vector
  2. Call store.search(identity, queryVector, { k: topK, minScore: threshold })
  3. Format matches as a system message
  4. Inject into the messages slot for this turn

This is the engine behind defineRAG. RAG is just SEMANTIC × TOP_K with retrieval-tuned defaults (asRole: 'user', topK: 3, threshold: 0.7).

examples/memory/04-topK-strategy.ts (region: define)
const memory = defineMemory({
id: 'semantic-recall',
type: MEMORY_TYPES.SEMANTIC,
strategy: {
kind: MEMORY_STRATEGIES.TOP_K,
topK: 3, // up to 3 most-relevant entries
threshold: 0.6, // strict: drop matches below 0.6 cosine
embedder,
},
store,
});

topK caps the number of matches returned. threshold is the minimum cosine similarity for a match to count — strict by design (see below). embedder produces the query vector + is called per turn at read time.

threshold is strict. When NO entries meet the threshold, NO content injects. The library throws on empty by design. Garbage-low-confidence chunks pollute the prompt and make hallucination MORE likely, not less — silently injecting weak matches is the wrong default.

Tuning notes:

  • Sentence-BERT-class embedders (all-MiniLM-L6-v2, etc.) often score relevant matches at 0.4–0.6. Use threshold: 0.5.
  • OpenAI text-embedding-3-* and Cohere embed-v3 typically sit comfortably with threshold: 0.7.
  • If you see frequent zero-result silent skips, lower the threshold and observe agentfootprint.context.injected to see which entries fire.

Each entry written with an embedder is tagged with the embedder’s id. Read-side cosine search filters to entries from the SAME embedder when embedderId is set on the search call. Mixing embedders silently corrupts retrieval — vectors from different models live in incompatible spaces.

If you swap embedders (model upgrade, provider switch), either:

  1. Re-embed and re-write all entries (invalidates the old embedderId), or
  2. Run two parallel SEMANTIC memories with different embedderIds during the migration

SEMANTIC × TOP_K requires a MemoryStore that implements search(). Today:

  • InMemoryStore (dev / tests) — O(n) linear scan, fine for thousands of entries
  • ⏳ pgvector / Pinecone / Qdrant / Weaviate (planned v2.6) — true vector indexing

The v2.3 production adapters (RedisStore, AgentCoreStore) ship without search() for now — Redis needs the RedisSearch module for vector indexing; AgentCore’s retrieval API is server-side and doesn’t fit the cosine-similarity contract. RAG users with v2.3 should use InMemoryStore until vector-capable production adapters land.

When to use this vs RAG vs other strategies

Section titled “When to use this vs RAG vs other strategies”
SituationStrategy
External docs corpus, retrieve-then-answerdefineRAG (sugar over this)
Conversation memory with cosine recallSEMANTIC × TOP_K (this guide)
Conversation memory, just keep recentEPISODIC × WINDOW
Conversation memory, distill into factsSEMANTIC × EXTRACT
  • Don’t fall back to top-K-anyway when threshold returns nothing. The library throws by design.
  • Don’t change embedders without re-indexing. Read-side filtering is your safety net; rotating without filter ID is a footgun.
  • Don’t pass huge entries (50KB content) to a vector store designed for small chunks. Chunk first.