Spans
A span is a named unit of work inside a trace. Spans form a tree: a parent span contains child spans. This tree should match how you think about your workflow.
What deserves a span
Create spans for steps you would want to:
- measure latency for
- debug when things go wrong
- compare across releases and models
Typical spans:
retrieval.queryretrieval.rerankllm.completiontool.executeguardrails.validatepostprocess.format
Span naming conventions
Use stable, predictable names:
- Prefer noun.verb or domain.action (
retrieval.query,tool.execute) - Avoid embedding dynamic values in names (use metadata instead)
- Keep the set of names small so dashboards and filters stay clean
What to attach to a span
Add structured fields that help debugging and filtering:
- Provider/model for
llm.*spans - Token usage and finish reasons when available
- Tool arguments/results (sanitized) for
tool.*spans - Cache hits for retrieval or completion caches
- Retries and error codes
Timing
If you wrap a function call with a span, you automatically get accurate timings. That gives you:
- per-step latency breakdowns
- p95/p99 hotspots
- “what changed?” comparisons across releases
Error handling
A span should record failures as first-class data:
- mark spans as error when exceptions occur
- record error type/message (avoid stack traces if they include secrets)
- include retry counts and whether a fallback was used