/autoresearch
Autoresearch
Self-improving skill optimization through scored experiments
Bespoke TierValidation & Quality
Overview
Runs automated experiment loops against any skill's SKILL.md, testing hypotheses, scoring outputs with pluggable judges, and committing improvements or reverting regressions. Based on Karpathy's autoresearch pattern, adapted for prompt optimization.
What It Does
- Forms testable hypotheses from failure modes and applies surgical edits to SKILL.md files
- Generates skill outputs against a test corpus and scores with anti-slop, structural, and custom judges
- Commits improvements that pass threshold and reverts regressions automatically via git
- Detects plateaus through consecutive revert counting and rotates across hypothesis categories
Inputs
- Skill name
- Test corpus
- Judge configuration
- Iteration count
Outputs
- Modified SKILL.md (on experiment branch)
- Experiment log with per-iteration scores
Example
/autoresearch
Run /autoresearch produce-content --iterations 5. It identifies that the skill scores low on structural completeness, adds a required-sections checklist to the SKILL.md, re-scores, and commits the +0.8 improvement. Two other hypotheses regress and get reverted.
Deep Dives
Ready to use /autoresearch?
This skill ships with every Knowledge OS installation. Set up your system in 90 minutes.
Built and maintained by Victor Sowers at STEEPWORKS