Viewing Evaluation Results
After the evaluation run is complete, the session grid is updated with evaluator-specific columns, displaying average scores for each session or trace. Clicking on a score allows you to drill down into detailed results—for example, selecting a Trajectory Evaluation score reveals which paths the agent followed and which it missed.- In the Sessions tab, both session-level and trace-level evaluation results are visible.
- In the Traces tab, only trace-level results are shown.
This helps you understand how agents behave, making it easier to debug issues, check quality, and improve performance based on real usage.