Documentation/Reliability/Fallback models
1 min read

Fallback models

Fallbacks help you maintain uptime and latency budgets when a provider is degraded or a prompt/model is too expensive.

When to fallback

  • rate limiting or provider outage
  • request deadlines approaching
  • cost budgets exceeded for a segment

Fallback strategies

  • model downgrade: switch to a cheaper/faster model
  • cap output: reduce max tokens or compress context
  • degraded mode: safe, minimal response with “try again” messaging

Trace it

Make fallbacks visible:

  • add fallback: true metadata on the span
  • record the original and fallback model ids
  • record why fallback was triggered

Next steps