Semantic retrieval (Top-K)
A user asks “how do I cancel my subscription?” on turn 1. On turn 47 they ask the same thing differently — “can I get out of the plan?” You’d want the SAME context to load both times, even though the wording differs. Semantic retrieval is what makes that possible: embed both queries, find the same documents in vector space.
What semantic retrieval is
Section titled “What semantic retrieval is”defineMemory({ type: SEMANTIC, strategy: { kind: TOP_K, topK, threshold, embedder } }) — every read embeds the current query, cosine-searches the vector store, returns the top-K most-similar entries that meet the threshold.
The read subflow:
- Embed the current
userMessageinto a vector - Call
store.search(identity, queryVector, { k: topK, minScore: threshold }) - Format matches as a system message
- Inject into the messages slot for this turn
This is the engine behind defineRAG. RAG is just SEMANTIC × TOP_K with retrieval-tuned defaults (asRole: 'user', topK: 3, threshold: 0.7).
Define a TOP_K memory
Section titled “Define a TOP_K memory”const memory = defineMemory({ id: 'semantic-recall', type: MEMORY_TYPES.SEMANTIC, strategy: { kind: MEMORY_STRATEGIES.TOP_K, topK: 3, // up to 3 most-relevant entries threshold: 0.6, // strict: drop matches below 0.6 cosine embedder, }, store,});topK caps the number of matches returned. threshold is the minimum cosine similarity for a match to count — strict by design (see below). embedder produces the query vector + is called per turn at read time.
Strict threshold semantics
Section titled “Strict threshold semantics”threshold is strict. When NO entries meet the threshold, NO content injects. The library throws on empty by design. Garbage-low-confidence chunks pollute the prompt and make hallucination MORE likely, not less — silently injecting weak matches is the wrong default.
Tuning notes:
- Sentence-BERT-class embedders (
all-MiniLM-L6-v2, etc.) often score relevant matches at 0.4–0.6. Usethreshold: 0.5. - OpenAI
text-embedding-3-*and Cohere embed-v3 typically sit comfortably withthreshold: 0.7. - If you see frequent zero-result silent skips, lower the threshold and observe
agentfootprint.context.injectedto see which entries fire.
Embedder identity matters
Section titled “Embedder identity matters”Each entry written with an embedder is tagged with the embedder’s id. Read-side cosine search filters to entries from the SAME embedder when embedderId is set on the search call. Mixing embedders silently corrupts retrieval — vectors from different models live in incompatible spaces.
If you swap embedders (model upgrade, provider switch), either:
- Re-embed and re-write all entries (invalidates the old
embedderId), or - Run two parallel SEMANTIC memories with different
embedderIds during the migration
Vector store requirements
Section titled “Vector store requirements”SEMANTIC × TOP_K requires a MemoryStore that implements search(). Today:
- ✅
InMemoryStore(dev / tests) — O(n) linear scan, fine for thousands of entries - ⏳ pgvector / Pinecone / Qdrant / Weaviate (planned v2.6) — true vector indexing
The v2.3 production adapters (RedisStore, AgentCoreStore) ship without search() for now — Redis needs the RedisSearch module for vector indexing; AgentCore’s retrieval API is server-side and doesn’t fit the cosine-similarity contract. RAG users with v2.3 should use InMemoryStore until vector-capable production adapters land.
When to use this vs RAG vs other strategies
Section titled “When to use this vs RAG vs other strategies”| Situation | Strategy |
|---|---|
| External docs corpus, retrieve-then-answer | defineRAG (sugar over this) |
| Conversation memory with cosine recall | SEMANTIC × TOP_K (this guide) |
| Conversation memory, just keep recent | EPISODIC × WINDOW |
| Conversation memory, distill into facts | SEMANTIC × EXTRACT |
Anti-patterns
Section titled “Anti-patterns”- Don’t fall back to top-K-anyway when threshold returns nothing. The library throws by design.
- Don’t change embedders without re-indexing. Read-side filtering is your safety net; rotating without filter ID is a footgun.
- Don’t pass huge entries (50KB content) to a vector store designed for small chunks. Chunk first.
Next steps
Section titled “Next steps”- RAG guide — the document-corpus flavor of this primitive
- Memory guide — the full type × strategy matrix
- Memory store adapters — vector-capable backends roadmap