Demystifying Evals for AI Agents

2 points | by vinhnx 18 hours ago

1 comments

  • sorokod 14 hours ago

    "You won't know if your graders are working well unless you read the transcripts and grades from many trials."

    Anyone can recommend tools for reading and analysing transcripts? I assume this is regarding data that CC is writing in .claude directory.