Documentation/Reliability/Retries & timeouts
1 min read

Retries & timeouts

Retries are necessary in production, but they can create hidden cost and duplicated side effects if implemented incorrectly.

Timeout taxonomy

  • connect timeout: cannot reach provider
  • read timeout: provider stalls during response/stream
  • overall deadline: request-level budget (recommended)

Use request-level deadlines to prevent a single step from consuming the entire SLA.

Retry only when:

  • request is idempotent (or you have idempotency keys)
  • error is transient (timeouts, 5xx, rate limits)

Avoid retrying:

  • invalid requests (4xx)
  • safety refusals
  • deterministic tool failures unless you changed inputs

Backoff strategy

  • exponential backoff with jitter
  • cap max retries and total time spent
  • record retry count and last error in spans

Observability

Always capture:

  • retries attempted
  • fallback used (yes/no)
  • provider/model
  • latency per attempt

Next steps