Semantic retrieval (Top-K)

A user asks “how do I cancel my subscription?” on turn 1. On turn 47 they ask the same thing differently — “can I get out of the plan?” You’d want the SAME context to load both times, even though the wording differs. Semantic retrieval is what makes that possible: embed both queries, find the same documents in vector space.

What semantic retrieval is

defineMemory({ type: SEMANTIC, strategy: { kind: TOP_K, topK, threshold, embedder } }) — every read embeds the current query, cosine-searches the vector store, returns the top-K most-similar entries that meet the threshold.

The read subflow:

Embed the current userMessage into a vector
Call store.search(identity, queryVector, { k: topK, minScore: threshold })
Format matches as a system message
Inject into the messages slot for this turn

This is the engine behind defineRAG. RAG is just SEMANTIC × TOP_K with retrieval-tuned defaults (asRole: 'user', topK: 3, threshold: 0.7).

Define a TOP_K memory

const memory = defineMemory({
  id: 'semantic-recall',
  type: MEMORY_TYPES.SEMANTIC,
  strategy: {
    kind: MEMORY_STRATEGIES.TOP_K,
    topK: 3,           // up to 3 most-relevant entries
    threshold: 0.6,    // strict: drop matches below 0.6 cosine
    embedder,
  },
  store,
});

topK caps the number of matches returned. threshold is the minimum cosine similarity for a match to count — strict by design (see below). embedder produces the query vector + is called per turn at read time.

Strict threshold semantics

threshold is strict. When NO entries meet the threshold, NO content injects. The library throws on empty by design. Garbage-low-confidence chunks pollute the prompt and make hallucination MORE likely, not less — silently injecting weak matches is the wrong default.

Tuning notes:

Sentence-BERT-class embedders (all-MiniLM-L6-v2, etc.) often score relevant matches at 0.4–0.6. Use threshold: 0.5.
OpenAI text-embedding-3-* and Cohere embed-v3 typically sit comfortably with threshold: 0.7.
If you see frequent zero-result silent skips, lower the threshold and observe agentfootprint.context.injected to see which entries fire.

Embedder identity matters

Each entry written with an embedder is tagged with the embedder’s id. Read-side cosine search filters to entries from the SAME embedder when embedderId is set on the search call. Mixing embedders silently corrupts retrieval — vectors from different models live in incompatible spaces.

If you swap embedders (model upgrade, provider switch), either:

Re-embed and re-write all entries (invalidates the old embedderId), or
Run two parallel SEMANTIC memories with different embedderIds during the migration

Vector store requirements

SEMANTIC × TOP_K requires a MemoryStore that implements search(). Today:

✅ InMemoryStore (dev / tests) — O(n) linear scan, fine for thousands of entries
⏳ pgvector / Pinecone / Qdrant / Weaviate (planned v2.6) — true vector indexing

The v2.3 production adapters (RedisStore, AgentCoreStore) ship without search() for now — Redis needs the RedisSearch module for vector indexing; AgentCore’s retrieval API is server-side and doesn’t fit the cosine-similarity contract. RAG users with v2.3 should use InMemoryStore until vector-capable production adapters land.

When to use this vs RAG vs other strategies

Situation	Strategy
External docs corpus, retrieve-then-answer	`defineRAG` (sugar over this)
Conversation memory with cosine recall	SEMANTIC × TOP_K (this guide)
Conversation memory, just keep recent	EPISODIC × WINDOW
Conversation memory, distill into facts	SEMANTIC × EXTRACT

Anti-patterns

Don’t fall back to top-K-anyway when threshold returns nothing. The library throws by design.
Don’t change embedders without re-indexing. Read-side filtering is your safety net; rotating without filter ID is a footgun.
Don’t pass huge entries (50KB content) to a vector store designed for small chunks. Chunk first.

Next steps

RAG guide — the document-corpus flavor of this primitive
Memory guide — the full type × strategy matrix
Memory store adapters — vector-capable backends roadmap