Skip to content

Grounding

Your agent answers a customer’s policy question. The customer screenshots the answer and emails your CEO: “I asked your bot whether refunds were available; it said no. Now I’m seeing they ARE available — what’s going on?” You go to look. The agent might have hallucinated. Without grounding evidence captured at the moment of the answer, you cannot tell. agentfootprint’s typed event stream + memory snapshots make grounding evidence a first-class artifact, not a thing you reconstruct.

Grounding = constraining the LLM to answer from supplied source material, not from training-data memory. Two complementary mechanisms:

  1. Source injection — the source material (docs, policy text, retrieved chunks, prior decisions) lands in the messages slot via RAG, Memory, or defineFact. The LLM sees it as fresh context.
  2. Source observation — the framework records what was injected (event: agentfootprint.context.injected) and what the LLM produced (event: agentfootprint.stream.llm_end). Side-by-side comparison is the grounding audit.

Together they form a verifiable answer: not “the LLM said X”, but “the LLM said X having been given Y as the source material.”

The RAG guide is the standard pattern. defineRAG retrieves top-K matching chunks above a strict threshold; if the threshold is missed, NO chunks inject (the LLM gets no context). This is intentional — garbage low-confidence chunks make hallucination MORE likely, not less.

const docs = defineRAG({
id: 'policy-docs',
store, embedder,
topK: 3, threshold: 0.7, // strict — no chunks if nothing clears 0.7
});
agent.rag(docs);

When chunks DO match, they land in the messages slot tagged with source: 'rag'. Observability surfaces show exactly which chunks were retrieved per turn.

For cross-run grounding (the customer asks “why did you tell me that earlier?”), Causal memory persists the agent’s decision evidence from past runs. New questions cosine-match past queries; matching snapshots inject the prior decision evidence. The follow-up answer is grounded in EXACT past facts, not reconstruction.

This is the differentiator no other framework has. See Memory guide § Causal memory.

Subscribe to agentfootprint.context.injected to capture every injection in real time:

agent.on('agentfootprint.context.injected', (e) => {
// e.payload — { source, slot, role?, content, id, ... }
// source: 'steering' | 'instruction' | 'skill' | 'fact' | 'memory' | 'rag'
audit.log({ when: Date.now(), source: e.payload.source, content: e.payload.content });
});

Every flavor of context engineering routes through the SAME event channel — the audit log is uniform across Skill, Steering, Instruction, Fact, Memory, RAG. One event taxonomy for context observability; no per-feature integration glue.

agentfootprint.stream.llm_end carries the full LLMResponse — content, toolCalls, usage, stopReason. Pair with context.injected events to produce grounding triplets:

For each turn:
injected_chunks = events.filter(e => e.type === 'context.injected')
llm_output = events.find(e => e.type === 'stream.llm_end')
triplet = { question, injected_chunks, llm_output }

That triplet IS the grounding evidence. Persist it (S3, Postgres, your audit trail) and you can show six months later exactly what the LLM saw and what it said.

  • Don’t claim “grounded” without persisting the triplet. Without injection-content + LLM-output captured at the moment, you have NO grounding evidence — just logs.
  • Don’t trust the LLM to cite sources from memory. It will hallucinate URLs / doc IDs. Cite by injecting the source AND requiring the LLM to repeat the injected source ID in its answer (verifiable via comparison).
  • Don’t mix high-confidence and low-confidence sources without flagging. If RAG returns chunks at score 0.9 and 0.5, the LLM treats them equally. Either threshold strictly OR weight injection role to mark uncertainty.