Skip to content

Grounding Analysis

An LLM calls lookup_order and gets {status: 'shipped', amount: 299}. Then it tells the user “Your order of $399 is being processed.” The amount is wrong and the status is different. This is a hallucination — and most frameworks have no way to detect it without a separate eval pipeline.

ExplainRecorder collects sources (tool results), claims (LLM outputs), and decisions (tool calls) during traversal — no post-processing.

import { Agent, mock, defineTool } from 'agentfootprint';
import { ExplainRecorder } from 'agentfootprint/explain';
const lookupOrder = defineTool({
id: 'lookup_order',
description: 'Look up an order',
inputSchema: { type: 'object', properties: { orderId: { type: 'string' } }, required: ['orderId'] },
handler: async ({ orderId }) => ({
content: JSON.stringify({ orderId, status: 'shipped', amount: 299 }),
}),
});
const explain = new ExplainRecorder();
const agent = Agent.create({
provider: mock([
{ content: '', toolCalls: [{ id: 'tc1', name: 'lookup_order', arguments: { orderId: 'ORD-1003' } }] },
{ content: 'Your order ORD-1003 is shipped. Total: $299.' },
]),
})
.tool(lookupOrder)
.recorder(explain)
.build();
await agent.run('Check order ORD-1003');
const report = explain.explain();
// {
// sources: [{ toolName: 'lookup_order', args: { orderId: 'ORD-1003' }, result: '...' }],
// claims: [{ content: 'Your order ORD-1003 is shipped. Total: $299.' }],
// decisions: [{ toolName: 'lookup_order', args: { orderId: 'ORD-1003' }, latencyMs: 2 }],
// summary: 'Agent called lookup_order (1 call), then responded based on the results.'
// }

Use the structured data to check for hallucinations:

function checkGrounding(report: Explanation): string[] {
const issues: string[] = [];
const sourceText = report.sources.map(s => s.result).join(' ');
for (const claim of report.claims) {
const amounts = claim.content.match(/\$(\d+)/g);
if (amounts) {
for (const amount of amounts) {
if (!sourceText.includes(amount.replace('$', ''))) {
issues.push(`Claim mentions ${amount} but source data doesn't contain this amount`);
}
}
}
}
return issues;
}
import { describe, it, expect } from 'vitest';
it('agent response is grounded in tool results', async () => {
await agent.run('Check order ORD-1003');
const report = explain.explain();
// Sources should exist (tools were called)
expect(report.sources.length).toBeGreaterThan(0);
// Claims should reference data actually in the sources
const sourceText = report.sources.map(s => s.result).join(' ');
for (const claim of report.claims) {
const amounts = claim.content.match(/\d{3,}/g) ?? [];
for (const amount of amounts) {
expect(sourceText).toContain(amount);
}
}
});
const explain = new ExplainRecorder();
const agent = Agent.create({ provider }).tool(orderTool).recorder(explain).build();
const result = await agent.run(userMessage);
const report = explain.explain();
const issues = checkGrounding(report);
if (issues.length > 0) {
logger.warn('Grounding issues detected', { issues, report });
}
return result.content;

ExplainRecorder implements AgentRecorder hooks:

  1. onToolCall — captures each tool result as a source of truth
  2. onTurnComplete — captures the final LLM response as a claim
  3. onLLMCall — tracks model/iteration for claim attribution

All data collected during traversal — follows the core principle. No narrative entry parsing, no post-processing.