RAG
Your support agent confidently tells a customer the refund window is 30 days. The actual policy says 14. The LLM is hallucinating from training data because you never gave it the source. RAG is how you stop the hallucination at its root — embed the user’s question, retrieve the actual policy chunks, inject them into the messages slot, and the LLM answers from the SOURCE instead of memory.
What RAG is
Section titled “What RAG is”RAG = retrieval-augmented generation. Conceptually:
- Embed the user’s query into a vector
- Search a vector store for top-K most-similar document chunks
- Inject those chunks into the next LLM call’s messages slot
- The LLM answers using the retrieved chunks as context
In agentfootprint, RAG is the same plumbing as defineMemory({ type: SEMANTIC, strategy: TOP_K }) — the same engine, the same observability event (agentfootprint.context.injected with source: 'rag'), the same multi-tenant identity. defineRAG is just a friendlier factory with retrieval-tuned defaults (asRole: 'user', topK: 3, threshold: 0.7).
Define a retriever, attach to an agent
Section titled “Define a retriever, attach to an agent”defineRAG({ id, store, embedder, topK?, threshold?, asRole? }) returns a definition. agent.rag(definition) attaches it (alias for .memory(definition) — same plumbing, clearer intent):
const docs = defineRAG({ id: 'product-docs', description: 'Product documentation chunks', store, embedder, topK: 2, // up to 2 most-relevant docs per query threshold: 0.5, // strict — drop weak matches asRole: 'user', // chunks land as user-role context (RAG default)});
const agent = Agent.create({ provider: provider ?? mock({ reply: 'Refunds are processed within 3 business days.' }), model: 'mock', maxIterations: 1,}) .system('You answer support questions using the retrieved docs.') .rag(docs) .build();The store is a MemoryStore with vector-search support. InMemoryStore works for dev. Production options: pgvector / Pinecone / Qdrant / Weaviate (planned for v2.6 — track on roadmap).
Strict threshold semantics
Section titled “Strict threshold semantics”threshold is strict. When no chunk meets the threshold, NO injection happens — the LLM gets no context. This is intentional: garbage low-confidence chunks pollute the prompt and make hallucination MORE likely, not less. The library throws on empty by design.
If your threshold is too high and you see frequent zero-result skips, lower to ~0.5 (sentence-BERT-class embedders often score relevant matches there). The defineRAG JSDoc has provider-specific tuning notes.
Indexing — indexDocuments
Section titled “Indexing — indexDocuments”Before queries can return chunks, the store needs documents. indexDocuments(store, embedder, docs) is the seeding helper — embeds each doc, batches into store.putMany(). Caps concurrency at 8 by default to avoid embedder rate limits:
import { indexDocuments } from 'agentfootprint';
await indexDocuments(store, embedder, [ { id: 'doc1', content: 'Refund policy: 14 days from delivery for full refund.' }, { id: 'doc2', content: 'Pro plan costs $20/month and includes priority support.' },]);Identity defaults to { conversationId: '_global' }. Multi-tenant footgun: if you index under _global but query under { tenant: 'acme' }, you’ll get zero results silently. Either index per-tenant OR query under the same identity. JSDoc spells out the resolution patterns.
Mocks-first development
Section titled “Mocks-first development”For dev, use mockEmbedder() — deterministic vectors, zero API cost. Swap to a real embedder when you ship:
const embedder = process.env.NODE_ENV === 'production' ? openaiEmbedder({ model: 'text-embedding-3-small' }) // planned v2.6 : mockEmbedder();Production embedder factories (openaiEmbedder / cohereEmbedder / bedrockEmbedder) are on the v2.6 roadmap. Until then, wire your provider’s embedding API into a small Embedder shim — see src/memory/embedding/types.ts for the interface.
Anti-patterns
Section titled “Anti-patterns”- Don’t fall back to top-K-anyway when threshold returns nothing. The library throws by design.
- Don’t change embedders between writes and reads — entries are tagged with the embedder used at index time. Swap silently corrupts retrieval.
- Don’t pass huge documents (50KB) — chunk first. RAG quality is dominated by chunk size + chunk boundary choice.
Next steps
Section titled “Next steps”- Memory guide —
defineMemorycovers all 4 types × 7 strategies including RAG - Memory store adapters — Redis · AgentCore · planned vector-capable backends