RAG

Your support agent confidently tells a customer the refund window is 30 days. The actual policy says 14. The LLM is hallucinating from training data because you never gave it the source. RAG is how you stop the hallucination at its root — embed the user’s question, retrieve the actual policy chunks, inject them into the messages slot, and the LLM answers from the SOURCE instead of memory.

What RAG is

RAG = retrieval-augmented generation. Conceptually:

Embed the user’s query into a vector
Search a vector store for top-K most-similar document chunks
Inject those chunks into the next LLM call’s messages slot
The LLM answers using the retrieved chunks as context

In agentfootprint, RAG is the same plumbing as defineMemory({ type: SEMANTIC, strategy: TOP_K }) — the same engine, the same observability event (agentfootprint.context.injected with source: 'rag'), the same multi-tenant identity. defineRAG is just a friendlier factory with retrieval-tuned defaults (asRole: 'user', topK: 3, threshold: 0.7).

Define a retriever, attach to an agent

defineRAG({ id, store, embedder, topK?, threshold?, asRole? }) returns a definition. agent.rag(definition) attaches it (alias for .memory(definition) — same plumbing, clearer intent):

const docs = defineRAG({
  id: 'product-docs',
  description: 'Product documentation chunks',
  store,
  embedder,
  topK: 2,           // up to 2 most-relevant docs per query
  threshold: 0.5,    // strict — drop weak matches
  asRole: 'user',    // chunks land as user-role context (RAG default)
});

const agent = Agent.create({
  provider: provider ?? mock({ reply: 'Refunds are processed within 3 business days.' }),
  model: 'mock',
  maxIterations: 1,
})
  .system('You answer support questions using the retrieved docs.')
  .rag(docs)
  .build();

The store is a MemoryStore with vector-search support. InMemoryStore works for dev. Production options: pgvector / Pinecone / Qdrant / Weaviate (planned for v2.6 — track on roadmap).

Strict threshold semantics

threshold is strict. When no chunk meets the threshold, NO injection happens — the LLM gets no context. This is intentional: garbage low-confidence chunks pollute the prompt and make hallucination MORE likely, not less. The library throws on empty by design.

If your threshold is too high and you see frequent zero-result skips, lower to ~0.5 (sentence-BERT-class embedders often score relevant matches there). The defineRAG JSDoc has provider-specific tuning notes.

Indexing — `indexDocuments`

Before queries can return chunks, the store needs documents. indexDocuments(store, embedder, docs) is the seeding helper — embeds each doc, batches into store.putMany(). Caps concurrency at 8 by default to avoid embedder rate limits:

import { indexDocuments } from 'agentfootprint';

await indexDocuments(store, embedder, [
  { id: 'doc1', content: 'Refund policy: 14 days from delivery for full refund.' },
  { id: 'doc2', content: 'Pro plan costs $20/month and includes priority support.' },
]);

Identity defaults to { conversationId: '_global' }. Multi-tenant footgun: if you index under _global but query under { tenant: 'acme' }, you’ll get zero results silently. Either index per-tenant OR query under the same identity. JSDoc spells out the resolution patterns.

Mocks-first development

For dev, use mockEmbedder() — deterministic vectors, zero API cost. Swap to a real embedder when you ship:

const embedder = process.env.NODE_ENV === 'production'
  ? openaiEmbedder({ model: 'text-embedding-3-small' })  // planned v2.6
  : mockEmbedder();

Production embedder factories (openaiEmbedder / cohereEmbedder / bedrockEmbedder) are on the v2.6 roadmap. Until then, wire your provider’s embedding API into a small Embedder shim — see src/memory/embedding/types.ts for the interface.

Anti-patterns

Don’t fall back to top-K-anyway when threshold returns nothing. The library throws by design.
Don’t change embedders between writes and reads — entries are tagged with the embedder used at index time. Swap silently corrupts retrieval.
Don’t pass huge documents (50KB) — chunk first. RAG quality is dominated by chunk size + chunk boundary choice.

Next steps

Memory guide — defineMemory covers all 4 types × 7 strategies including RAG
Memory store adapters — Redis · AgentCore · planned vector-capable backends

RAG