4 points | by _josh_meyer_ 2 hours ago
2 comments
SantaBench, a fun benchmark with a serious methodology. The task: play a cheeky Santa agent who researches users online and roasts them based on their social media.
OP here -- I work at Veris and built this. Happy to answer questions about the methodology!
SantaBench, a fun benchmark with a serious methodology. The task: play a cheeky Santa agent who researches users online and roasts them based on their social media.
OP here -- I work at Veris and built this. Happy to answer questions about the methodology!