/autoresearch

Autoresearch

Self-improving skill optimization through scored experiments

Bespoke TierValidation & Quality

Overview

Runs automated experiment loops against any skill's SKILL.md, testing hypotheses, scoring outputs with pluggable judges, and committing improvements or reverting regressions. Based on Karpathy's autoresearch pattern, adapted for prompt optimization.

What It Does

  • Forms testable hypotheses from failure modes and applies surgical edits to SKILL.md files
  • Generates skill outputs against a test corpus and scores with anti-slop, structural, and custom judges
  • Commits improvements that pass threshold and reverts regressions automatically via git
  • Detects plateaus through consecutive revert counting and rotates across hypothesis categories

Inputs

  • Skill name
  • Test corpus
  • Judge configuration
  • Iteration count

Outputs

  • Modified SKILL.md (on experiment branch)
  • Experiment log with per-iteration scores

Example

/autoresearch

Run /autoresearch produce-content --iterations 5. It identifies that the skill scores low on structural completeness, adds a required-sections checklist to the SKILL.md, re-scores, and commits the +0.8 improvement. Two other hypotheses regress and get reverted.

Ready to use /autoresearch?

This skill ships with every Knowledge OS installation. Set up your system in 90 minutes.

Built and maintained by Victor Sowers at STEEPWORKS