2 comments

  • shloveai 9 hours ago

    Someone said 2025 was the year of Agents. I feel like 2026 would be the year of evaluation. Seeing more and more agents are hitting the wall and increasing needs to keep all data and evaluation on local, I built Evalyn — a local-first evaluation pipeline for LLM and agent apps. It traces real executions, evaluates them with suggested metrics and LLM judges, and automatically calibrates those judges using a small amount of human feedback, all without sending data to a SaaS.

    It’s open-source, CLI-driven, and meant to make evals something you can actually trust to evolve your GenAI app. Would love to white-glove support for whom are interested in it.

  • kundan_s__r 7 hours ago

    please check verdic.dev