Grounding Analysis

The problem

An LLM calls lookup_order and gets {status: 'shipped', amount: 299}. Then it tells the user “Your order of $399 is being processed.” The amount is wrong and the status is different. This is a hallucination — and most frameworks have no way to detect it without a separate eval pipeline.

How agentfootprint solves it

ExplainRecorder collects sources (tool results), claims (LLM outputs), and decisions (tool calls) during traversal — no post-processing.

import { Agent, mock, defineTool } from 'agentfootprint';
import { ExplainRecorder } from 'agentfootprint/explain';

const lookupOrder = defineTool({
  id: 'lookup_order',
  description: 'Look up an order',
  inputSchema: { type: 'object', properties: { orderId: { type: 'string' } }, required: ['orderId'] },
  handler: async ({ orderId }) => ({
    content: JSON.stringify({ orderId, status: 'shipped', amount: 299 }),
  }),
});

const explain = new ExplainRecorder();

const agent = Agent.create({
  provider: mock([
    { content: '', toolCalls: [{ id: 'tc1', name: 'lookup_order', arguments: { orderId: 'ORD-1003' } }] },
    { content: 'Your order ORD-1003 is shipped. Total: $299.' },
  ]),
})
  .tool(lookupOrder)
  .recorder(explain)
  .build();

await agent.run('Check order ORD-1003');

const report = explain.explain();
// {
//   sources: [{ toolName: 'lookup_order', args: { orderId: 'ORD-1003' }, result: '...' }],
//   claims: [{ content: 'Your order ORD-1003 is shipped. Total: $299.' }],
//   decisions: [{ toolName: 'lookup_order', args: { orderId: 'ORD-1003' }, latencyMs: 2 }],
//   summary: 'Agent called lookup_order (1 call), then responded based on the results.'
// }

Comparing sources to claims

Use the structured data to check for hallucinations:

function checkGrounding(report: Explanation): string[] {
  const issues: string[] = [];
  const sourceText = report.sources.map(s => s.result).join(' ');

  for (const claim of report.claims) {
    const amounts = claim.content.match(/\$(\d+)/g);
    if (amounts) {
      for (const amount of amounts) {
        if (!sourceText.includes(amount.replace('$', ''))) {
          issues.push(`Claim mentions ${amount} but source data doesn't contain this amount`);
        }
      }
    }
  }

  return issues;
}

Using in tests

import { describe, it, expect } from 'vitest';

it('agent response is grounded in tool results', async () => {
  await agent.run('Check order ORD-1003');

  const report = explain.explain();

  // Sources should exist (tools were called)
  expect(report.sources.length).toBeGreaterThan(0);

  // Claims should reference data actually in the sources
  const sourceText = report.sources.map(s => s.result).join(' ');
  for (const claim of report.claims) {
    const amounts = claim.content.match(/\d{3,}/g) ?? [];
    for (const amount of amounts) {
      expect(sourceText).toContain(amount);
    }
  }
});

Using in production

const explain = new ExplainRecorder();
const agent = Agent.create({ provider }).tool(orderTool).recorder(explain).build();

const result = await agent.run(userMessage);

const report = explain.explain();
const issues = checkGrounding(report);
if (issues.length > 0) {
  logger.warn('Grounding issues detected', { issues, report });
}

return result.content;

How it works

ExplainRecorder implements AgentRecorder hooks:

onToolCall — captures each tool result as a source of truth
onTurnComplete — captures the final LLM response as a claim
onLLMCall — tracks model/iteration for claim attribution

All data collected during traversal — follows the core principle. No narrative entry parsing, no post-processing.

Next steps

Observability guide — recorders and narrative
Testing guide — test grounding in CI