I hope to help you evaluate your GenAI App

1 points | by shloveai 9 hours ago

2 comments

shloveai 9 hours ago
Someone said 2025 was the year of Agents. I feel like 2026 would be the year of evaluation. Seeing more and more agents are hitting the wall and increasing needs to keep all data and evaluation on local, I built Evalyn — a local-first evaluation pipeline for LLM and agent apps. It traces real executions, evaluates them with suggested metrics and LLM judges, and automatically calibrates those judges using a small amount of human feedback, all without sending data to a SaaS.
It’s open-source, CLI-driven, and meant to make evals something you can actually trust to evolve your GenAI app. Would love to white-glove support for whom are interested in it.
kundan_s__r 7 hours ago
please check verdic.dev