/eval-loop
Eval Loop
Trace quality gaps to root causes, fix iteratively
B2B TierValidation & Quality
Overview
Takes a specific quality complaint (thin data, robotic tone, broken UX) and traces it from symptom to structural root cause, then iterates with automated backpressure until measurable targets pass. Works across UX, data, code, and content.
What It Does
- Surfaces and groups symptoms into problem classes, then identifies root causes per class
- Defines measurable targets with automated backpressure (unit tests, Playwright, LLM-as-judge)
- Iterates one fix at a time with verification, logging pass/fail to JSONL audit trail
- Applies product-standard checklists across all pages to catch the same class of problem everywhere
Inputs
- Quality complaints or symptoms
- Current codebase or content
- Definition of "10/10"
Outputs
- eval-session.md (living diagnosis)
- eval-results.jsonl (verification log)
- Passing targets
Example
/eval-loop
A user says "the signal cards are paper thin." Eval loop traces that to missing provenance URLs in the agent prompt and missing UI components, sets Playwright assertions as targets, and iterates until every card has a clickable source link.
Deep Dives
Ready to use /eval-loop?
This skill ships with every Knowledge OS installation. Set up your system in 90 minutes.
Built and maintained by Victor Sowers at STEEPWORKS