Thoth

Public Eval Dashboard

Citation evaluation, in public.

3 golden SLR questions, 4 metrics, the latest commit-of-record run for each. Designed so a regression is a public signal — not a hidden one.

Last runcommit883ecd5

Citation recall
75%
Citation precision
100%
Claim faithfulness
38%
Expected-claim coverage
0%

By question

Most recent run per (question × metric).

QuestionRecallPrecisionFaithfulnessCoverage
000-tdd-web-frameworks100%100%43%0%
001-llm-code-review-security25%100%29%0%
002-rag-architecture-patterns100%100%42%0%