Observability
Track activity and troubleshoot issues using built-in Analytics and Test Results. Export data or integrate with your own tooling as needed.
Analytics
- Metrics: Test runs, pass/fail rates, judge scores, and latency (avg, p50, p95, max).
- Summary: Suite performance over time with trend charts.
Test Results
- Inspect step inputs/outputs and error details per run.
- Use run detail pages to debug and review transcripts.
Tips
- Keep rubrics clear; the judge reasoning helps debug unexpected failures.
- Export transcripts and results for audits or post-incident reviews.
Last updated on