Prompt management
Prompts are part of your system: they should be versioned, reviewable, and deployable with the same discipline as backend code. Boson prompt management helps you store templates, attach them to environments, and roll forward—or roll back—when evals or production signal a problem.
This pairs directly with Tracing (know which version ran) and Evaluations (prove a version is better).
Why version prompts
- Reproducibility — “what did we send to the model last Tuesday?”
- Collaboration — PR-style review instead of shared Google Docs.
- Safe rollback — one click when a deploy hurts quality.
- Audit — who changed what, when, and in which environment.
Prompts as templates
Treat each prompt as a template with variables, not a hard-coded string in code.
Variables
- User content — message, attachments metadata.
- System context — locale, plan tier, feature flags.
- Retrieved context — RAG snippets (with clear delimiters in the template).
Keep variable names stable across versions so code does not churn when copy changes.
Structure
- Prefer a system block for rules and constraints.
- Use a user block for the actual task and dynamic content.
- Document invariants (“never invent URLs”, “cite sources”) in system text reviewers can see.
Environments
Map prompts to environments the same way you map APIs:
| Environment | Typical use |
|---|---|
| Development | Fast iteration, noisy experiments OK. |
| Staging | Mirror prod config; run full eval suites here. |
| Production | Only promoted, reviewed versions. |
Promotion should be explicit (button or API), not “whoever edited last.”
Lifecycle: draft → review → promote
- Draft a new version (copy change, model instruction tweak, new variable).
- Run evals against your pinned dataset; compare to the current production prompt.
- Review with a teammate for policy, brand, and failure modes.
- Promote to staging, then production after checks pass.
Connecting prompts to traces
When you log traces, record prompt version id (and optionally template hash) on the LLM span. That makes it possible to:
- Correlate spikes in errors with a specific prompt deploy.
- Slice metrics by version in analytics.
Rollback
If production quality drops after a prompt change:
- Revert to the last known-good version immediately.
- Preserve the bad version for postmortem and eval additions.
- Add failing cases to your dataset so the mistake cannot recur silently.
Access and governance
- Restrict who can edit production-linked prompts.
- Keep read access broad for support and PMs where appropriate.
- Export or backup critical templates for disaster recovery.
Common pitfalls
- Prompts only in code — you lose history and non-engineer collaboration.
- No eval gate — copy changes ship without measurement.
- Unbounded variables — missing validation lets huge context blow token limits or leak data.
Next steps
- Evaluations — gate every prompt promotion on metrics.
- Tracing — tag spans with prompt version for production forensics.