Build

RAG

defineRAG — sugar over defineMemory(SEMANTIC + TOP_K) with retrieval-friendly defaults. Chunks land in the messages slot when cosine similarity clears the threshold.

Your support agent confidently tells a customer the refund window is 30 days. The actual policy says 14. The LLM is hallucinating from training data because you never gave it the source. RAG is how you stop the hallucination at its root — embed the user's question, retrieve the actual policy chunks, inject them into the messages slot, and the LLM answers from the SOURCE instead of memory.

What RAG is

RAG = retrieval-augmented generation. Conceptually:

  1. Embed the user's query into a vector
  2. Search a vector store for top-K most-similar document chunks
  3. Inject those chunks into the next LLM call's messages slot
  4. The LLM answers using the retrieved chunks as context

In agentfootprint, RAG is the same plumbing as defineMemory({ type: SEMANTIC, strategy: TOP_K }) — the same engine, the same observability event (agentfootprint.context.injected with source: 'rag'), the same multi-tenant identity. defineRAG is just a friendlier factory with retrieval-tuned defaults (asRole: 'user', topK: 3, threshold: 0.7).

Define a retriever, attach to an agent

defineRAG({ id, store, embedder, description?, embedderId?, topK?, threshold?, asRole? }) returns a MemoryDefinition. agent.rag(definition) attaches it (alias for .memory(definition) — same plumbing, clearer intent). It throws at construction time if store lacks search(). Pair embedderId here with the same value passed to indexDocuments so a later embedder swap is filtered out of results:

const docs = defineRAG({  id: 'product-docs',  description: 'Product documentation chunks',  store,  embedder,  topK: 2,           // up to 2 most-relevant docs per query  threshold: 0.5,    // strict — drop weak matches  asRole: 'user',    // chunks land as user-role context (RAG default)});const agent = Agent.create({  provider: provider ?? mock({ reply: 'Refunds are processed within 3 business days.' }),  model: 'mock',  maxIterations: 1,})  .system('You answer support questions using the retrieved docs.')  .rag(docs)  .build();

The store is a MemoryStore with vector-search support (it must implement search()defineRAG throws at construction time if it doesn't). InMemoryStore works for dev. Production vector backends (pgvector / Pinecone / Qdrant / Weaviate) are not yet shipped — wire your own MemoryStore adapter against src/memory/store/types.ts until then.

Strict threshold semantics

threshold is strict. When no chunk meets the threshold, NO injection happens — the LLM gets no context. This is intentional: garbage low-confidence chunks pollute the prompt and make hallucination MORE likely, not less. The library throws on empty by design.

If your threshold is too high and you see frequent zero-result skips, lower to ~0.5 (sentence-BERT-class embedders often score relevant matches there). The defineRAG JSDoc has provider-specific tuning notes.

Indexing — indexDocuments

Before queries can return chunks, the store needs documents. indexDocuments(store, embedder, docs, options?) is the seeding helper — embeds each doc (via embedder.embedBatch when available, else capped-concurrency single calls), then batches into store.putMany(). Returns the number of docs indexed. Each RagDocument is { id, content, metadata? }:

import { indexDocuments } from 'agentfootprint';

const count = await indexDocuments(store, embedder, [
  { id: 'doc1', content: 'Refund policy: 14 days from delivery for full refund.' },
  { id: 'doc2', content: 'Pro plan costs $20/month and includes priority support.', metadata: { topic: 'plans' } },
]);

The optional fourth argument is an IndexDocumentsOptions bag: identity, embedderId, tier, ttlMs, signal, and maxConcurrency (caps concurrent embed calls at 8 by default to avoid embedder rate limits; ignored when the embedder implements embedBatch):

await indexDocuments(store, embedder, docs, {
  identity: { tenant: 'acme' },   // scope the corpus to one tenant
  embedderId: 'openai-3-small',   // tag entries so a later embedder swap is filtered out
  maxConcurrency: 4,
});

Identity defaults to { conversationId: '_global' }. Multi-tenant footgun: if you index under _global but query under { tenant: 'acme' }, you'll get zero results silently. Either index per-tenant (pass options.identity) OR query under the same identity. JSDoc spells out the resolution patterns.

Mocks-first development

For dev, use mockEmbedder() — deterministic vectors, zero API cost. Swap to a real embedder when you ship:

const embedder = process.env.NODE_ENV === 'production'
  ? myOpenAIEmbedder({ model: 'text-embedding-3-small' })  // your Embedder shim
  : mockEmbedder();

There are no built-in production embedder factories yet (mockEmbedder is the only shipped Embedder). Wire your provider's embedding API into a small Embedder shim — see src/memory/embedding/types.ts for the interface.

Anti-patterns

  • Don't fall back to top-K-anyway when threshold returns nothing. The library throws by design.
  • Don't change embedders between writes and reads — entries are tagged with the embedder used at index time. Swap silently corrupts retrieval.
  • Don't pass huge documents (50KB) — chunk first. RAG quality is dominated by chunk size + chunk boundary choice.

Next steps

  • Memory guidedefineMemory covers all 4 types × 7 strategies including RAG
  • Memory store adaptersRedisStore · AgentCoreStore · planned vector-capable backends

On this page