Trace a RAG pipeline
RAG failures often come from retrieval—not the model. Tracing should make it obvious what was retrieved and why the model answered that way.
Spans to include
retrieval.query: query text, filters, top-k, latencyretrieval.results: document ids, scores, chunk ids (avoid full documents if sensitive)retrieval.rerank(if used): reranker model + top-k changesllm.completion: prompt template id/version, model, output
What to log without leaking data
- store document ids and chunk ids in spans
- store short summaries of retrieved chunks (optional)
- redact PII and secrets before sending anything
Attribution
If you can, include a simple mapping from answer sentences → chunk ids. Even a coarse attribution helps debugging and evals.
Next steps
- Add dataset examples from real RAG failures: Datasets.