Grounding
Reduce hallucination by giving the LLM the source material — and recording what it produced vs what it was given. The trace IS the grounding evidence.
Your agent answers a customer's policy question. The customer screenshots the answer and emails your CEO: "I asked your bot whether refunds were available; it said no. Now I'm seeing they ARE available — what's going on?" You go to look. The agent might have hallucinated. Without grounding evidence captured at the moment of the answer, you cannot tell. agentfootprint's typed event stream + memory snapshots make grounding evidence a first-class artifact, not a thing you reconstruct.
What grounding means here
Grounding = constraining the LLM to answer from supplied source material, not from training-data memory. Two complementary mechanisms:
- Source injection — the source material (docs, policy text, retrieved chunks, prior decisions) lands in the messages slot via RAG, Memory, or
defineFact. The LLM sees it as fresh context. - Source observation — the framework records what was injected (event:
agentfootprint.context.injected) and what the LLM produced (event:agentfootprint.stream.llm_end). Side-by-side comparison is the grounding audit.
Together they form a verifiable answer: not "the LLM said X", but "the LLM said X having been given Y as the source material."
RAG as grounding
The RAG guide is the standard pattern. defineRAG retrieves top-K matching chunks above a strict threshold; if the threshold is missed, NO chunks inject (the LLM gets no context). This is intentional — garbage low-confidence chunks make hallucination MORE likely, not less.
const docs = defineRAG({
id: 'policy-docs',
store, embedder,
topK: 3, threshold: 0.7, // strict — no chunks if nothing clears 0.7
});
agent.rag(docs);When chunks DO match, they land in the messages slot tagged with source: 'rag'. Observability surfaces show exactly which chunks were retrieved per turn.
Causal memory as grounding for follow-ups
For cross-run grounding (the customer asks "why did you tell me that earlier?"), Causal memory persists snapshots of past runs. New questions cosine-match past queries; matching snapshots inject the stored run. The follow-up answer is grounded in what actually happened, not reconstruction.
This is the differentiator no other framework has. See Memory guide § Causal memory.
Recording what was injected
Subscribe to agentfootprint.context.injected to capture every injection in real time:
agent.on('agentfootprint.context.injected', (e) => {
// e.payload — { source, slot, contentSummary, rawContent?, asRole?, sourceId?, reason, ... }
// source: 'steering' | 'instructions' | 'skill' | 'fact' | 'memory' | 'rag' | ...
// rawContent is the full injected source (may be redacted); contentSummary is always present.
const content = e.payload.rawContent ?? e.payload.contentSummary;
audit.log({ when: Date.now(), source: e.payload.source, content });
});Every flavor of context engineering routes through the SAME event channel — the audit log is uniform across Skill, Steering, Instruction, Fact, Memory, RAG. One event taxonomy for context observability; no per-feature integration glue.
Recording what the LLM produced
agentfootprint.stream.llm_end carries the LLMEndPayload — content, toolCallCount, usage, stopReason, durationMs, iteration. Pair with context.injected events to produce grounding triplets:
For each turn:
injected_chunks = events.filter(e => e.type === 'agentfootprint.context.injected')
llm_output = events.find(e => e.type === 'agentfootprint.stream.llm_end')
triplet = { question, injected_chunks, llm_output }That triplet IS the grounding evidence. Persist it (S3, Postgres, your audit trail) and you can show six months later exactly what the LLM saw and what it said.
Anti-patterns
- Don't claim "grounded" without persisting the triplet. Without injection-content + LLM-output captured at the moment, you have NO grounding evidence — just logs.
- Don't trust the LLM to cite sources from memory. It will hallucinate URLs / doc IDs. Cite by injecting the source AND requiring the LLM to repeat the injected source ID in its answer (verifiable via comparison).
- Don't mix high-confidence and low-confidence sources without flagging. If RAG returns chunks at score 0.9 and 0.5, the LLM treats them equally. Either threshold strictly OR weight injection role to mark uncertainty.
Next steps
- RAG guide — the strict-threshold retrieval pattern
- Memory guide — Causal memory for cross-run grounding
- Observability guide — the full 59-event taxonomy
Strict output (Instructor-style schema retry)
Wire outputSchema validation INTO the reliability gate so failed validations re-prompt the model within the current turn — without burning a full ReAct loop iteration. New helpers, ephemeral-message handling, and stuck-loop detection.
Memory
One factory, four types, seven strategies. Persistent context across agent runs — observable, swappable, multi-tenant.
