repo-scoping/docs/self-scoping/workflow.md

# Self-Scoping Assessment Workflow

Self-scoping is the feedback loop for improving repo-scoping with evidence. The
loop is simple: run the current engine against repo-scoping itself, compare the
result to a curated golden profile and known bad runs, then record whether the
new result is better.

## Outcome Terms

- `baseline`: a result accepted as a reference point for later comparisons.
- `challenger`: a fresh result from a new engine version or configuration.
- `preferred`: the reviewer chose this result over the prior baseline.
- `tied`: the reviewer judged old and new results roughly equivalent.
- `rejected`: the result is known bad and should not become baseline truth.
- `superseded`: the result used to be useful but was replaced by a newer
  preferred assessment.
- `needs-human`: the result cannot be judged confidently without curator
  review.

The known 2026-05-15 run 39 artifact is a `rejected` negative regression seed,
not a baseline to imitate.

## Release Binding

Assessment output is only useful if it is bound to the engine that generated it.
Comparable challenger artifacts should record:

- repo-scoping package version
- engine git commit
- engine release or tag when available
- engine dirty state
- scanner version
- candidate generator version
- quality criteria version
- prompt version when LLM or agentic review is used

An artifact with `release_binding_status=complete` can be compared as a real
challenger. An artifact with `historical_incomplete` can still be useful as a
negative seed, but it should not become a preferred baseline. An `unbound`
artifact is diagnostic only.

Dirty state does not automatically make an artifact useless, but it must be
visible. A dirty challenger should usually be rerun after the relevant changes
are committed.

## Standard Loop

1. Run the self-assessment command:

   ```bash
   repo-scoping self-assess \
     --source-path . \
     --assessment-output docs/self-scoping/assessments/repo-scoping-challenger.json \
     --comparison-output docs/self-scoping/assessments/repo-scoping-challenger.md
   ```

2. Read the comparison report.

3. Open the curator UI at `/ui/self-scoping` to compare the golden profile and
   assessment artifact side by side.

4. When an earlier baseline assessment exists, use the same page's two-run
   comparison to judge old output against the new challenger.

5. If the report says `regression`, inspect forbidden capabilities, misplaced
   features, and known regression patterns first.

6. If the report says `needs_review`, inspect missing expected capabilities and
   source evidence before choosing old or new output.

7. If the report says `candidate_improvement`, still confirm that the
   hierarchy, source refs, and native-utility boundaries make sense.

8. Record the decision as an assessment outcome before changing the active
   baseline. The UI writes append-only outcome records under
   `docs/self-scoping/outcomes/`; it does not rewrite historical assessment or
   golden-profile artifacts.

## CI Use

Use `--fail-on-regression` only when regressions should block the command:

```bash
repo-scoping self-assess \
  --source-path . \
  --comparison-output /tmp/repo-scoping-self-assessment.md \
  --fail-on-regression
```

The command should not fail for ordinary `needs_review` results. Review-needed
output is signal, not a broken build.

## Updating The Golden Profile

Update `golden/repo-scoping-golden-profile.v1.json` when the repository's real
product utility has changed. Examples:

- repo-scoping adds a genuinely new user-facing capability.
- a capability is renamed after curator agreement.
- a former out-of-scope behavior becomes product intent and has supporting
  implementation evidence.

Do not update the golden profile just because the engine failed to find an
expected capability. That is usually an engine issue.

## Fixing The Engine

Fix the engine when a challenger:

- repeats a known regression pattern
- promotes dependency, fixture, schema, scanner-rule, or workplan vocabulary as
  native capability truth
- places features under a capability they do not support
- loses source refs or cites evidence that does not support the abstraction
- relies on generated `SCOPE.md` as primary proof for rebuilding the same model

The 2026-05-15 run 39 failure is the canonical example: provider vocabulary from
scanner code, tests, fixtures, and schema examples became the false native
capability `Route LLM Requests Across Providers`. The correct action is to fix
scanner/generator/acceptance behavior, not to teach the golden profile that
repo-scoping is an LLM router.

## Relationship To Agentic Acceptance

Deterministic assessment can reject, downgrade, or flag output with transparent
criteria. It should not approve candidate characteristics as registry truth.
When automation stands in for human review, the decision belongs to an agentic
reviewer that inspects evidence, applies versioned criteria, and records a
rationale. That acceptance redesign is tracked in `RREG-WP-0014`.