Fallback models
Fallbacks help you maintain uptime and latency budgets when a provider is degraded or a prompt/model is too expensive.
When to fallback
- rate limiting or provider outage
- request deadlines approaching
- cost budgets exceeded for a segment
Fallback strategies
- model downgrade: switch to a cheaper/faster model
- cap output: reduce max tokens or compress context
- degraded mode: safe, minimal response with “try again” messaging
Trace it
Make fallbacks visible:
- add
fallback: truemetadata on the span - record the original and fallback model ids
- record why fallback was triggered
Next steps
- Add evals for degraded behavior: Evaluations