/eval-loop

Eval Loop

Trace quality gaps to root causes, fix iteratively

B2B TierValidation & Quality

Overview

Takes a specific quality complaint (thin data, robotic tone, broken UX) and traces it from symptom to structural root cause, then iterates with automated backpressure until measurable targets pass. Works across UX, data, code, and content.

What It Does

  • Surfaces and groups symptoms into problem classes, then identifies root causes per class
  • Defines measurable targets with automated backpressure (unit tests, Playwright, LLM-as-judge)
  • Iterates one fix at a time with verification, logging pass/fail to JSONL audit trail
  • Applies product-standard checklists across all pages to catch the same class of problem everywhere

Inputs

  • Quality complaints or symptoms
  • Current codebase or content
  • Definition of "10/10"

Outputs

  • eval-session.md (living diagnosis)
  • eval-results.jsonl (verification log)
  • Passing targets

Example

/eval-loop

A user says "the signal cards are paper thin." Eval loop traces that to missing provenance URLs in the agent prompt and missing UI components, sets Playwright assertions as targets, and iterates until every card has a clickable source link.

Ready to use /eval-loop?

This skill ships with every Knowledge OS installation. Set up your system in 90 minutes.

Built and maintained by Victor Sowers at STEEPWORKS