Retries & timeouts
Retries are necessary in production, but they can create hidden cost and duplicated side effects if implemented incorrectly.
Timeout taxonomy
- connect timeout: cannot reach provider
- read timeout: provider stalls during response/stream
- overall deadline: request-level budget (recommended)
Use request-level deadlines to prevent a single step from consuming the entire SLA.
Retry rules (recommended)
Retry only when:
- request is idempotent (or you have idempotency keys)
- error is transient (timeouts, 5xx, rate limits)
Avoid retrying:
- invalid requests (4xx)
- safety refusals
- deterministic tool failures unless you changed inputs
Backoff strategy
- exponential backoff with jitter
- cap max retries and total time spent
- record retry count and last error in spans
Observability
Always capture:
- retries attempted
- fallback used (yes/no)
- provider/model
- latency per attempt