generated from coulomb/repo-seed
99 lines
4.5 KiB
Markdown
99 lines
4.5 KiB
Markdown
# Self-Scoping Assessment Artifacts
|
|
|
|
This directory contains repo-scoping's own baseline and assessment artifacts.
|
|
These files are meant to make scoping-engine changes comparable across releases
|
|
instead of relying on memory or screenshots.
|
|
|
|
## Artifact Types
|
|
|
|
- `golden/repo-scoping-golden-profile.v1.json` is the curated target profile for
|
|
repo-scoping itself.
|
|
- `assessments/repo-scoping-known-bad-2026-05-15-run-39.json` captures the
|
|
known-bad self-analysis that promoted LLM-provider vocabulary into native
|
|
repo-scoping capability truth.
|
|
- `assessments/repo-scoping-post-wp0015-clean-2026-05-15.json` captures the
|
|
first clean, release-bound deterministic challenger after acceptance-boundary
|
|
and input-hygiene work. It remains a rejected regression because candidate
|
|
generation still collapses repo-scoping's native surfaces under the forbidden
|
|
provider-routing capability, but its source set no longer includes
|
|
`var/checkouts/` contamination.
|
|
- `assessments/repo-scoping-post-wp0016-native-2026-05-15.json` captures the
|
|
first deterministic challenger after native candidate generation recovery. It
|
|
matches every expected capability in the golden profile and has no known
|
|
provider-routing regression, while still leaving generated candidates pending
|
|
review with quality-gate signals.
|
|
- `workflow.md` explains how to run challenger assessments, interpret outcomes,
|
|
and decide whether to update the golden profile or fix the engine.
|
|
- `outcomes/` stores append-only reviewer decisions created from side-by-side
|
|
comparisons.
|
|
- `../schemas/self-scoping-assessment.schema.json` defines the immutable
|
|
assessment-run artifact shape.
|
|
|
|
## Release Binding
|
|
|
|
Comparable assessment artifacts must bind generated results to the repo-scoping
|
|
engine release that produced them. A complete binding records package version,
|
|
engine git commit or release tag, dirty state, scanner version, candidate
|
|
generator version, quality criteria version, and prompt version when applicable.
|
|
|
|
The current known-bad artifact is marked `historical_incomplete` because the
|
|
original database run did not record the engine commit. It remains useful as a
|
|
negative regression seed, but future challenger artifacts should be fully bound
|
|
before they are accepted as comparable baselines.
|
|
|
|
## Review Use
|
|
|
|
When the engine changes, run repo-scoping against itself and export a challenger
|
|
assessment. Compare the challenger to the golden profile and to the negative
|
|
seed. Reviewers should be able to choose whether the old result, new result, or
|
|
neither is better, then store that judgement as a new assessment outcome.
|
|
|
|
The curator UI exposes this loop at `/ui/self-scoping`. It reads the golden and
|
|
assessment JSON files from this directory, highlights missing, forbidden, and
|
|
misplaced hierarchy entries, and records reviewer preference without mutating
|
|
the compared artifacts. The same page can compare two assessment runs directly
|
|
so reviewers can choose whether the old baseline or new challenger is better.
|
|
|
|
## Export Command
|
|
|
|
Export a completed analysis run as a challenger artifact:
|
|
|
|
```bash
|
|
repo-scoping export-assessment \
|
|
--repo repo-scoping \
|
|
--analysis-run 39 \
|
|
--output docs/self-scoping/assessments/repo-scoping-challenger-run-39.json
|
|
```
|
|
|
|
The command reads an existing registry database and does not clone or scan the
|
|
target repository. It records the target analysis metadata, candidate graph,
|
|
approved map at export time, review decisions, fact and content summaries, known
|
|
regression patterns, and current repo-scoping engine identity.
|
|
|
|
Compare an assessment against the curated golden profile:
|
|
|
|
```bash
|
|
repo-scoping compare-assessment \
|
|
--golden docs/self-scoping/golden/repo-scoping-golden-profile.v1.json \
|
|
--assessment docs/self-scoping/assessments/repo-scoping-known-bad-2026-05-15-run-39.json \
|
|
--format markdown
|
|
```
|
|
|
|
The first comparison report highlights missing expected capabilities, forbidden
|
|
native capabilities, known regression patterns, and misplaced API/CLI features.
|
|
|
|
Run the full self-assessment loop:
|
|
|
|
```bash
|
|
repo-scoping self-assess \
|
|
--source-path . \
|
|
--assessment-output docs/self-scoping/assessments/repo-scoping-challenger.json \
|
|
--comparison-output docs/self-scoping/assessments/repo-scoping-challenger.md
|
|
```
|
|
|
|
By default this path is deterministic-only and leaves generated candidates
|
|
pending review. Add `--with-llm` only when a provider is configured and the run
|
|
should include LLM-assisted candidate extraction. Add `--fail-on-regression` in
|
|
CI when known regressions should fail the command; ordinary `needs_review`
|
|
comparisons still exit successfully.
|