Reduce hallucination by giving the LLM the source material — and recording what it produced vs what it was given. The trace IS the grounding evidence.

Your agent answers a customer's policy question. The customer screenshots the answer and emails your CEO: "I asked your bot whether refunds were available; it said no. Now I'm seeing they ARE available — what's going on?" You go to look. The agent might have hallucinated. Without grounding evidence captured at the moment of the answer, you cannot tell. agentfootprint's typed event stream + memory snapshots make grounding evidence a first-class artifact, not a thing you reconstruct.

What grounding means here

Grounding = constraining the LLM to answer from supplied source material, not from training-data memory. Two complementary mechanisms:

Source injection — the source material (docs, policy text, retrieved chunks, prior decisions) lands in the messages slot via RAG, Memory, or defineFact. The LLM sees it as fresh context.
Source observation — the framework records what was injected (event: agentfootprint.context.injected) and what the LLM produced (event: agentfootprint.stream.llm_end). Side-by-side comparison is the grounding audit.

Together they form a verifiable answer: not "the LLM said X", but "the LLM said X having been given Y as the source material."

The RAG guide is the standard pattern. defineRAG retrieves top-K matching chunks above a strict threshold; if the threshold is missed, NO chunks inject (the LLM gets no context). This is intentional — garbage low-confidence chunks make hallucination MORE likely, not less.

const docs = defineRAG({
  id: 'policy-docs',
  store, embedder,
  topK: 3, threshold: 0.7,  // strict — no chunks if nothing clears 0.7
});
agent.rag(docs);

When chunks DO match, they land in the messages slot tagged with source: 'rag'. Observability surfaces show exactly which chunks were retrieved per turn.

Causal memory as grounding for follow-ups

For cross-run grounding (the customer asks "why did you tell me that earlier?"), Causal memory persists snapshots of past runs. New questions cosine-match past queries; matching snapshots inject the stored run. The follow-up answer is grounded in what actually happened, not reconstruction.

This is the differentiator no other framework has. See Memory guide § Causal memory.

Recording what was injected

Subscribe to agentfootprint.context.injected to capture every injection in real time:

agent.on('agentfootprint.context.injected', (e) => {
  // e.payload — { source, slot, contentSummary, rawContent?, asRole?, sourceId?, reason, ... }
  // source: 'steering' | 'instructions' | 'skill' | 'fact' | 'memory' | 'rag' | ...
  // rawContent is the full injected source (may be redacted); contentSummary is always present.
  const content = e.payload.rawContent ?? e.payload.contentSummary;
  audit.log({ when: Date.now(), source: e.payload.source, content });
});

Every flavor of context engineering routes through the SAME event channel — the audit log is uniform across Skill, Steering, Instruction, Fact, Memory, RAG. One event taxonomy for context observability; no per-feature integration glue.

Recording what the LLM produced

agentfootprint.stream.llm_end carries the LLMEndPayload — content, toolCallCount, usage, stopReason, durationMs, iteration. Pair with context.injected events to produce grounding triplets:

For each turn:
  injected_chunks = events.filter(e => e.type === 'agentfootprint.context.injected')
  llm_output       = events.find(e => e.type === 'agentfootprint.stream.llm_end')
  triplet           = { question, injected_chunks, llm_output }

That triplet IS the grounding evidence. Persist it (S3, Postgres, your audit trail) and you can show six months later exactly what the LLM saw and what it said.

Anti-patterns

Don't claim "grounded" without persisting the triplet. Without injection-content + LLM-output captured at the moment, you have NO grounding evidence — just logs.
Don't trust the LLM to cite sources from memory. It will hallucinate URLs / doc IDs. Cite by injecting the source AND requiring the LLM to repeat the injected source ID in its answer (verifiable via comparison).
Don't mix high-confidence and low-confidence sources without flagging. If RAG returns chunks at score 0.9 and 0.5, the LLM treats them equally. Either threshold strictly OR weight injection role to mark uncertainty.

Next steps

RAG guide — the strict-threshold retrieval pattern
Memory guide — Causal memory for cross-run grounding
Observability guide — the full 59-event taxonomy

Grounding