# Self-Scoping Assessment Artifacts This directory contains repo-scoping's own baseline and assessment artifacts. These files are meant to make scoping-engine changes comparable across releases instead of relying on memory or screenshots. ## Artifact Types - `golden/repo-scoping-golden-profile.v1.json` is the curated target profile for repo-scoping itself. - `assessments/repo-scoping-known-bad-2026-05-15-run-39.json` captures the known-bad self-analysis that promoted LLM-provider vocabulary into native repo-scoping capability truth. - `assessments/repo-scoping-post-wp0015-clean-2026-05-15.json` captures the first clean, release-bound deterministic challenger after acceptance-boundary and input-hygiene work. It remains a rejected regression because candidate generation still collapses repo-scoping's native surfaces under the forbidden provider-routing capability, but its source set no longer includes `var/checkouts/` contamination. - `workflow.md` explains how to run challenger assessments, interpret outcomes, and decide whether to update the golden profile or fix the engine. - `outcomes/` stores append-only reviewer decisions created from side-by-side comparisons. - `../schemas/self-scoping-assessment.schema.json` defines the immutable assessment-run artifact shape. ## Release Binding Comparable assessment artifacts must bind generated results to the repo-scoping engine release that produced them. A complete binding records package version, engine git commit or release tag, dirty state, scanner version, candidate generator version, quality criteria version, and prompt version when applicable. The current known-bad artifact is marked `historical_incomplete` because the original database run did not record the engine commit. It remains useful as a negative regression seed, but future challenger artifacts should be fully bound before they are accepted as comparable baselines. ## Review Use When the engine changes, run repo-scoping against itself and export a challenger assessment. Compare the challenger to the golden profile and to the negative seed. Reviewers should be able to choose whether the old result, new result, or neither is better, then store that judgement as a new assessment outcome. The curator UI exposes this loop at `/ui/self-scoping`. It reads the golden and assessment JSON files from this directory, highlights missing, forbidden, and misplaced hierarchy entries, and records reviewer preference without mutating the compared artifacts. The same page can compare two assessment runs directly so reviewers can choose whether the old baseline or new challenger is better. ## Export Command Export a completed analysis run as a challenger artifact: ```bash repo-scoping export-assessment \ --repo repo-scoping \ --analysis-run 39 \ --output docs/self-scoping/assessments/repo-scoping-challenger-run-39.json ``` The command reads an existing registry database and does not clone or scan the target repository. It records the target analysis metadata, candidate graph, approved map at export time, review decisions, fact and content summaries, known regression patterns, and current repo-scoping engine identity. Compare an assessment against the curated golden profile: ```bash repo-scoping compare-assessment \ --golden docs/self-scoping/golden/repo-scoping-golden-profile.v1.json \ --assessment docs/self-scoping/assessments/repo-scoping-known-bad-2026-05-15-run-39.json \ --format markdown ``` The first comparison report highlights missing expected capabilities, forbidden native capabilities, known regression patterns, and misplaced API/CLI features. Run the full self-assessment loop: ```bash repo-scoping self-assess \ --source-path . \ --assessment-output docs/self-scoping/assessments/repo-scoping-challenger.json \ --comparison-output docs/self-scoping/assessments/repo-scoping-challenger.md ``` By default this path is deterministic-only and leaves generated candidates pending review. Add `--with-llm` only when a provider is configured and the run should include LLM-assisted candidate extraction. Add `--fail-on-regression` in CI when known regressions should fail the command; ordinary `needs_review` comparisons still exit successfully.