Skip to content

Backward Causal Chain: Error Stack Traces for Data Quality

April 2026 · Sanjay Krishna Anbalagan


Your loan pipeline runs 5 stages. The final decision is rejected, but the quality score is 0.2 — something went wrong. Which stage caused it?

You could read the narrative top to bottom. But in an agent loop with 15 iterations, 3 tool calls each, that’s 45 stages of output. You need a stack trace for quality — not for crashes, but for data quality degradation.

Quality Trace (score: 0.20 at reject#4):
at reject#4 score=0.20 — rejection decision
at eval-risk#2 score=0.40 (via riskTier) — high risk assigned
at seed#0 score=1.00 (via creditScore) — input data
Root cause: quality dropped at reject#4 (0.40 → 0.20, Δ0.20)

This is causalChain() — backward program slicing on the commit log.


Every stage in footprintjs writes through a transactional buffer. The commit log records what each stage wrote. Every recorder tracks what each stage read. Together, they form an implicit dependency graph:

Seed wrote creditScore, dti
↓ (EvalRisk reads creditScore, dti)
EvalRisk wrote riskTier
↓ (Route reads riskTier)
Route → chose reject branch
↓ (Reject reads riskTier)
Reject wrote decision

causalChain() walks this graph backward from any starting point. The algorithm is BFS backward thin-slicing — a simplified form of program slicing (Weiser 1984, Sridharan et al. 2007):

  1. Start at the target step (e.g., reject#4)
  2. Get what it read via the getKeysRead callback
  3. For each key read, find who last wrote it (findLastWriter)
  4. Create a parent node, enqueue it
  5. Repeat until no more dependencies

The output is a DAG (not a linked list). If a stage reads creditScore AND dti from different writers, it has two parents:

Seed
/ \
EvalRisk (also writes dti)
\
Route
|
Reject

Staged optimization: the query optimizer trick

Section titled “Staged optimization: the query optimizer trick”

For small pipelines (≤ 256 commits), findLastWriter scans the log backward — O(N) per lookup, zero setup cost. Fast enough.

For agent loops with 500+ iterations, that’s expensive. So causalChain() automatically switches to a reverse index: a prebuilt Map<key, sortedWriterIndices[]> that enables O(log N) binary search per lookup.

The consumer never sees this. It’s chosen internally — like a database query optimizer selecting between sequential scan and index scan based on table size.

Pipeline SizeStrategyPer-Lookup Cost
≤ 256 commitsLinear scanO(N) — zero setup
> 256 commitsReverse index + binary searchO(log N) — O(N×U) setup amortized

This is where the recorder operations become the foundation:

import { causalChain, flattenCausalDAG, formatCausalChain } from 'footprintjs/trace';
import { QualityRecorder, qualityTrace, formatQualityTrace } from 'footprintjs/trace';
// 1. Collect quality scores during traversal (Translate pattern)
const quality = new QualityRecorder((id, ctx) => {
if (ctx.keysWritten.includes('decision')) return { score: 0.2, factors: ['rejection'] };
if (ctx.keysWritten.includes('riskTier')) return { score: 0.4, factors: ['high risk'] };
return { score: 1.0 };
});
executor.attachRecorder(quality);
await executor.run();
// 2. Find the lowest-scoring step (Aggregate pattern)
const lowest = quality.getLowest();
// 3. Backtrack through the causal chain
const trace = qualityTrace(
executor.getSnapshot().commitLog,
quality,
lowest.runtimeStageId,
);
console.log(formatQualityTrace(trace));

The QualityRecorder uses Translate (per-step scores), Accumulate (progressive quality up to slider), and Aggregate (overall score). The causalChain() utility reads from the commit log — which was built during the same traversal, zero post-processing.

Everything is collected during the single DFS pass. The backtracking is a post-execution query over data that was already captured.


ScenarioWithout causal chainWith causal chain
Debug low-quality agent response30 min reading logs10 seconds reading trace
Root cause analysisSenior eng requiredAny engineer (or the LLM itself)
SLA investigation”We’ll look into it”Immediate root cause ID

At 5 quality incidents per week, 20 minutes saved each: ~8.7 hours/month of senior eng time returned.

The causal chain is LLM context — instead of the LLM reasoning across scattered data, it reads the pre-computed dependency graph:

ApproachTokensModel RequiredCost/1M
LLM reasons across logs~2,500Reasoning model$15.00
LLM reads causal chain~200Any model$0.25

At 10,000 queries/day: $2,190/day savings.

When you can see exactly where quality dropped, you fix the right thing:

  • System prompt too vague? → Fix the prompt, not the model
  • Tool returned bad data? → Fix the tool, not the agent loop
  • Context window too large? → Trim the right messages

Without the causal chain, teams throw money at bigger models hoping it fixes quality. With it, they make targeted fixes.


memory/backtrack.ts → causalChain() — pure algorithm, leaf node
depends on: memory/types, memory/commitLogUtils
recorder/QualityRecorder.ts → scoring during traversal
depends on: recorder/KeyedRecorder
recorder/qualityTrace.ts → decorates causalChain with quality scores
depends on: memory/backtrack + recorder/QualityRecorder

Clean lib-of-libs layering. The backtracking algorithm knows nothing about quality. The quality layer knows nothing about how the DAG is built. They compose.


Terminal window
npx tsx examples/post-execution/causal-chain/05-diamond.ts
npx tsx examples/post-execution/quality-trace/02-root-cause.ts

Or explore in the interactive playground.