Semantic retrieval (Top-K)
Cosine-similarity search over embedded entries. Returns the most relevant top-K above a strict threshold — the read-side primitive behind RAG.
A user asks "how do I cancel my subscription?" on turn 1. On turn 47 they ask the same thing differently — "can I get out of the plan?" You'd want the SAME context to load both times, even though the wording differs. Semantic retrieval is what makes that possible: embed both queries, find the same documents in vector space.
What semantic retrieval is
defineMemory({ type: SEMANTIC, strategy: { kind: TOP_K, topK, threshold, embedder } }) — every read embeds the current query, cosine-searches the vector store, returns the top-K most-similar entries that meet the threshold.
The read subflow:
- Embed the current
userMessageinto a vector - Call
store.search(identity, queryVector, { k: topK, minScore: threshold }) - Format matches as a system message
- Inject into the messages slot for this turn
This is the engine behind defineRAG. RAG is just SEMANTIC × TOP_K with retrieval-tuned defaults (asRole: 'user', topK: 3, threshold: 0.7).
Define a TOP_K memory
const memory = defineMemory({ id: 'semantic-recall', type: MEMORY_TYPES.SEMANTIC, strategy: { kind: MEMORY_STRATEGIES.TOP_K, topK: 3, // up to 3 most-relevant entries threshold: 0.6, // strict: drop matches below 0.6 cosine embedder, }, store,});topK caps the number of matches returned. threshold is the minimum cosine similarity for a match to count — strict by design (see below). embedder produces the query vector + is called per turn at read time.
Strict threshold semantics
threshold is strict. When NO entries meet the threshold, NO content injects — the read stage sets loaded = [], the picker selects nothing, and the formatter emits nothing. There is no fallback that injects weak matches anyway. Garbage-low-confidence chunks pollute the prompt and make hallucination MORE likely, not less — silently injecting weak matches is the wrong default.
(The library does NOT throw at runtime on an empty result — "no match" is a safe, valid outcome. The only throws are at configuration time: an unsupported type × strategy combination, or a TOP_K strategy pointed at a store that lacks search().)
Tuning notes:
- Sentence-BERT-class embedders (
all-MiniLM-L6-v2, etc.) often score relevant matches at 0.4–0.6. Usethreshold: 0.5. - OpenAI
text-embedding-3-*and Cohere embed-v3 typically sit comfortably withthreshold: 0.7. - If you see frequent zero-result silent skips, lower the threshold and observe
agentfootprint.context.injectedto see which entries fire.
Embedder identity matters
Each entry written with an embedder is tagged with the embedder's id. Read-side cosine search filters to entries from the SAME embedder when embedderId is set on the search call. Mixing embedders silently corrupts retrieval — vectors from different models live in incompatible spaces.
If you swap embedders (model upgrade, provider switch), either:
- Re-embed and re-write all entries (invalidates the old
embedderId), or - Run two parallel SEMANTIC memories with different
embedderIds during the migration
Vector store requirements
SEMANTIC × TOP_K requires a MemoryStore that implements the optional search() method. Today:
- ✅
InMemoryStore(dev / tests) — O(n) linear scan, fine for thousands of entries - ⏳ pgvector / Pinecone / Qdrant / Weaviate (planned) — true vector indexing
The production adapters (RedisStore, AgentCoreStore, exported from agentfootprint/memory-providers) ship without search() for now — Redis needs the RedisSearch module for vector indexing; AgentCore's retrieval API is server-side and doesn't fit the cosine-similarity contract. Use InMemoryStore until vector-capable production adapters land.
When to use this vs RAG vs other strategies
| Situation | Strategy |
|---|---|
| External docs corpus, retrieve-then-answer | defineRAG (sugar over this) |
| Conversation memory with cosine recall | SEMANTIC × TOP_K (this guide) |
| Conversation memory, just keep recent | EPISODIC × WINDOW |
| Conversation memory, distill into facts | SEMANTIC × EXTRACT |
Anti-patterns
- Don't fall back to top-K-anyway when threshold returns nothing. The library injects nothing by design — empty is the correct, safe outcome.
- Don't change embedders without re-indexing. Read-side filtering is your safety net; rotating without filter ID is a footgun.
- Don't pass huge entries (50KB content) to a vector store designed for small chunks. Chunk first.
Next steps
- RAG guide — the document-corpus flavor of this primitive
- Memory guide — the full type × strategy matrix
- Memory store adapters — vector-capable backends roadmap
Narrative memory (Summarization)
Compress long conversations into beats. Older turns are LLM-summarized into a shorter narrative; recent turns stay raw. Trades one cheap LLM call per write for token savings on every read.
RAG
defineRAG — sugar over defineMemory(SEMANTIC + TOP_K) with retrieval-friendly defaults. Chunks land in the messages slot when cosine similarity clears the threshold.
