Add self-scoping review UI

This commit is contained in:
2026-05-15 14:56:53 +02:00
parent fc034bd821
commit f690794acd
9 changed files with 1185 additions and 8 deletions

View File

@@ -13,6 +13,8 @@ instead of relying on memory or screenshots.
repo-scoping capability truth.
- `workflow.md` explains how to run challenger assessments, interpret outcomes,
and decide whether to update the golden profile or fix the engine.
- `outcomes/` stores append-only reviewer decisions created from side-by-side
comparisons.
- `../schemas/self-scoping-assessment.schema.json` defines the immutable
assessment-run artifact shape.
@@ -35,6 +37,12 @@ assessment. Compare the challenger to the golden profile and to the negative
seed. Reviewers should be able to choose whether the old result, new result, or
neither is better, then store that judgement as a new assessment outcome.
The curator UI exposes this loop at `/ui/self-scoping`. It reads the golden and
assessment JSON files from this directory, highlights missing, forbidden, and
misplaced hierarchy entries, and records reviewer preference without mutating
the compared artifacts. The same page can compare two assessment runs directly
so reviewers can choose whether the old baseline or new challenger is better.
## Export Command
Export a completed analysis run as a challenger artifact:

View File

@@ -0,0 +1,9 @@
# Self-Scoping Review Outcomes
This directory stores append-only review decisions recorded from the
self-scoping comparison UI. Outcome files bind a reviewer choice to a golden
profile, an assessment artifact, and the repo-scoping engine identity captured
in that assessment.
Do not edit historical assessment artifacts to record a preference. Add a new
outcome record instead.

View File

@@ -56,17 +56,25 @@ are committed.
2. Read the comparison report.
3. If the report says `regression`, inspect forbidden capabilities, misplaced
3. Open the curator UI at `/ui/self-scoping` to compare the golden profile and
assessment artifact side by side.
4. When an earlier baseline assessment exists, use the same page's two-run
comparison to judge old output against the new challenger.
5. If the report says `regression`, inspect forbidden capabilities, misplaced
features, and known regression patterns first.
4. If the report says `needs_review`, inspect missing expected capabilities and
6. If the report says `needs_review`, inspect missing expected capabilities and
source evidence before choosing old or new output.
5. If the report says `candidate_improvement`, still confirm that the
7. If the report says `candidate_improvement`, still confirm that the
hierarchy, source refs, and native-utility boundaries make sense.
6. Record the decision as an assessment outcome before changing the active
baseline.
8. Record the decision as an assessment outcome before changing the active
baseline. The UI writes append-only outcome records under
`docs/self-scoping/outcomes/`; it does not rewrite historical assessment or
golden-profile artifacts.
## CI Use