Rollback playbook
When a prompt change hurts production, speed matters. The best rollback is one you can do confidently in minutes.
What triggers a rollback
- spike in user complaints or support tickets
- safety/policy violations
- tool misuse or runaway loops
- latency or cost budgets exceeded
- evaluation regressions on production traffic samples
Immediate actions
- Roll back to the last known-good prompt version
- Annotate traces from the bad window with the bad prompt version id
- Communicate status internally (what changed, when, what rollback occurred)
Root cause workflow
After rollback:
- identify the failure cluster in traces
- add the failing cases to your dataset
- re-run evals with the new cases included
- ship a fixed version through staging gates
Prevent repeats
- require eval gates for promotions
- use holdout sets to avoid overfitting
- add monitoring dashboards per prompt version