Resilience
Production traffic peaks Monday morning. Anthropic returns 429s for the next 90 seconds. Your support agent has 200 concurrent users and zero patience for a backoff loop. The framework’s resilience decorators wrap any provider with retry + fallback so your agent degrades gracefully instead of throwing user-visible errors.
Three composable decorators
Section titled “Three composable decorators”| Decorator | What it does |
|---|---|
withRetry(provider, opts) | Wraps a provider with retry-on-retryable-error. Honors LLMError.retryable classification. AbortSignal-aware sleep. |
withFallback(primary, fallback) | If primary throws a fallback-eligible error, retry on fallback. Stream pinning prevents provider-flip mid-stream. |
fallbackProvider([providers]) / resilientProvider({...}) | Convenience composers — chain N fallbacks + retry in one factory call. |
All three preserve the LLMProvider interface — drop-in replacements for the underlying provider. They compose freely:
import { withRetry, withFallback, anthropic, openai } from 'agentfootprint';
const provider = withRetry( withFallback( anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! }), openai({ apiKey: process.env.OPENAI_API_KEY! }), ), { maxAttempts: 5 },);Reads as: try anthropic; on failure fall back to openai; the whole chain is wrapped in retry with 5 attempts. Right-fold of withFallback + outer withRetry is the standard production composition.
Convenience: resilientProvider
Section titled “Convenience: resilientProvider”For the common 1-primary-N-fallbacks-with-retry shape, use the factory:
import { resilientProvider, anthropic, openai, bedrock } from 'agentfootprint';
const provider = resilientProvider({ primary: anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! }), fallbacks: [ openai({ apiKey: process.env.OPENAI_API_KEY! }), bedrock({ region: 'us-west-2' }), ], retry: { maxAttempts: 3, backoff: 'exponential' },});Honoring the LLMError.retryable contract
Section titled “Honoring the LLMError.retryable contract”Every provider adapter wraps SDK errors in LLMError with retryable flagging. The default withRetry policy retries when err.retryable === true AND attempt < maxAttempts. Override shouldRetry to implement custom policies:
withRetry(provider, { maxAttempts: 5, shouldRetry: (err, attempt) => { if (err.code === 'rate_limit') return true; // always retry rate limits if (err.code === 'server') return attempt < 3; // cap server errors at 3 return false; },});shouldFallback works the same way for withFallback.
Hooks for observability
Section titled “Hooks for observability”Both decorators call optional hooks so your recorders can log retry attempts + provider switches:
withRetry(provider, { maxAttempts: 5, onRetry: (err, attempt) => console.log(`retry ${attempt}: ${err.message}`),});
withFallback(primary, fallback, { onFallback: (err) => console.log(`falling back: ${err.message}`),});Both also fire typed events through the standard event dispatcher — listen to agentfootprint.cost.tick to see cost across both primary and fallback providers.
What’s NOT here yet (Reliability subsystem — v2.5)
Section titled “What’s NOT here yet (Reliability subsystem — v2.5)”The deferred Reliability subsystem adds three more primitives that compose ON TOP of these decorators:
CircuitBreaker— trip after N consecutive failures, open for cooldown period, half-open probe before re-closing. Prevents thundering-herd retry on a downed provider.- 3-tier output fallback —
outputFallback(primary, fallback, canned). If both providers fail, return a canned response (or escalate). agent.resumeOnError(checkpoint, input)— auto-checkpoint at iteration boundaries; resume from the failure point with corrected input.
The current decorators cover the production-critical 80%. The Reliability subsystem covers the long tail.
Anti-patterns
Section titled “Anti-patterns”- Don’t retry non-retryable errors.
err.retryableis the contract; honor it. - Don’t put
withRetryBELOWwithFallback. Wrong order: every retry on the primary delays the fallback. Right order: outerwithRetryretries the WHOLE fallback chain. - Don’t compose decorators inside the provider’s hot path. Build the chain ONCE at app startup; pass the composed provider into every
Agent.create({ provider }).
Next steps
Section titled “Next steps”- Error handling — typed errors + tool-error contract
- Mocks-first development —
mock({ replies })+ decorators stack identically; test resilience policies offline