Build

Semantic retrieval (Top-K)

Cosine-similarity search over embedded entries. Returns the most relevant top-K above a strict threshold — the read-side primitive behind RAG.

A user asks "how do I cancel my subscription?" on turn 1. On turn 47 they ask the same thing differently — "can I get out of the plan?" You'd want the SAME context to load both times, even though the wording differs. Semantic retrieval is what makes that possible: embed both queries, find the same documents in vector space.

What semantic retrieval is

defineMemory({ type: SEMANTIC, strategy: { kind: TOP_K, topK, threshold, embedder } }) — every read embeds the current query, cosine-searches the vector store, returns the top-K most-similar entries that meet the threshold.

The read subflow:

  1. Embed the current userMessage into a vector
  2. Call store.search(identity, queryVector, { k: topK, minScore: threshold })
  3. Format matches as a system message
  4. Inject into the messages slot for this turn

This is the engine behind defineRAG. RAG is just SEMANTIC × TOP_K with retrieval-tuned defaults (asRole: 'user', topK: 3, threshold: 0.7).

Define a TOP_K memory

const memory = defineMemory({  id: 'semantic-recall',  type: MEMORY_TYPES.SEMANTIC,  strategy: {    kind: MEMORY_STRATEGIES.TOP_K,    topK: 3,           // up to 3 most-relevant entries    threshold: 0.6,    // strict: drop matches below 0.6 cosine    embedder,  },  store,});

topK caps the number of matches returned. threshold is the minimum cosine similarity for a match to count — strict by design (see below). embedder produces the query vector + is called per turn at read time.

Strict threshold semantics

threshold is strict. When NO entries meet the threshold, NO content injects — the read stage sets loaded = [], the picker selects nothing, and the formatter emits nothing. There is no fallback that injects weak matches anyway. Garbage-low-confidence chunks pollute the prompt and make hallucination MORE likely, not less — silently injecting weak matches is the wrong default.

(The library does NOT throw at runtime on an empty result — "no match" is a safe, valid outcome. The only throws are at configuration time: an unsupported type × strategy combination, or a TOP_K strategy pointed at a store that lacks search().)

Tuning notes:

  • Sentence-BERT-class embedders (all-MiniLM-L6-v2, etc.) often score relevant matches at 0.4–0.6. Use threshold: 0.5.
  • OpenAI text-embedding-3-* and Cohere embed-v3 typically sit comfortably with threshold: 0.7.
  • If you see frequent zero-result silent skips, lower the threshold and observe agentfootprint.context.injected to see which entries fire.

Embedder identity matters

Each entry written with an embedder is tagged with the embedder's id. Read-side cosine search filters to entries from the SAME embedder when embedderId is set on the search call. Mixing embedders silently corrupts retrieval — vectors from different models live in incompatible spaces.

If you swap embedders (model upgrade, provider switch), either:

  1. Re-embed and re-write all entries (invalidates the old embedderId), or
  2. Run two parallel SEMANTIC memories with different embedderIds during the migration

Vector store requirements

SEMANTIC × TOP_K requires a MemoryStore that implements the optional search() method. Today:

  • InMemoryStore (dev / tests) — O(n) linear scan, fine for thousands of entries
  • ⏳ pgvector / Pinecone / Qdrant / Weaviate (planned) — true vector indexing

The production adapters (RedisStore, AgentCoreStore, exported from agentfootprint/memory-providers) ship without search() for now — Redis needs the RedisSearch module for vector indexing; AgentCore's retrieval API is server-side and doesn't fit the cosine-similarity contract. Use InMemoryStore until vector-capable production adapters land.

When to use this vs RAG vs other strategies

SituationStrategy
External docs corpus, retrieve-then-answerdefineRAG (sugar over this)
Conversation memory with cosine recallSEMANTIC × TOP_K (this guide)
Conversation memory, just keep recentEPISODIC × WINDOW
Conversation memory, distill into factsSEMANTIC × EXTRACT

Anti-patterns

  • Don't fall back to top-K-anyway when threshold returns nothing. The library injects nothing by design — empty is the correct, safe outcome.
  • Don't change embedders without re-indexing. Read-side filtering is your safety net; rotating without filter ID is a footgun.
  • Don't pass huge entries (50KB content) to a vector store designed for small chunks. Chunk first.

Next steps

On this page