Public Eval Dashboard
Citation evaluation, in public.
3 golden SLR questions, 4 metrics, the latest commit-of-record run for each. Designed so a regression is a public signal — not a hidden one.
Last runcommit883ecd5
Citation recall
75%
Citation precision
100%
Claim faithfulness
38%
Expected-claim coverage
0%
By question
Most recent run per (question × metric).
| Question | Recall | Precision | Faithfulness | Coverage |
|---|---|---|---|---|
| 000-tdd-web-frameworks | 100% | 100% | 43% | 0% |
| 001-llm-code-review-security | 25% | 100% | 29% | 0% |
| 002-rag-architecture-patterns | 100% | 100% | 42% | 0% |